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FTF.T.D OF THF TNV^NTTON 

The present invaition relates generally to the field of neurological and 
physiological dysfunctions associated with Alzheimer's Disease. More particularly, 

15 the invention is concerned with^the identification, isolation and cloning of the gene 
which when mutated is assodatedNyith Alzheimer's Disease as well as its transcript, 
gene products and associated sequence information and neighbouring genes. The 
present invention also relates to method\of diagnosing for and detection of carriers 
of the gene, Alzheimer's Disease diagnbsis, gene therapy using recombinant 

20 technologies and therapy using the informatioir^rived from the DNA, protein, and 
the metabolic function of the protein. 

BACKGROUND OF THE INVENTION 

In order to facilitate reference to various journal Vticles, a listing of the 
25 articles is provided at the end of this specification. 

Alzheimer's Disease (AD) is a degenerative disorder of\he human central 
nervous system characterized by progressive memory impairment and cognitive and 
intellectual decline during mid to late adult life (Katzman, 1986). The disease is 
accompanied by a constellation of neuropathology features principal amongst which 
30 are the presence of extracellular amyloid or senile plaques and the neurofibrillary 
degeneration of neurons. The etiology of this disease is complex, although in some 




families it appears to be inherited as an autosomal dominant trait. However, even 
amongst these inherited forms of AD, there are at least three different genes which 
confer inherited susceptibility to this disease (St George-Hyslop et al. f 1990). The e4 
(Cysll2Arg) allelic polymorphism of the Apolipoprotein E (ApoE) gene has been 

5 associated with AD in a significant proportion of cases with onset late in life 
(Saunders et al., 1993; Strittmatter et al., 1993). Similarly, a very small proportion 
of familial cases with onset before age 65 years have been associated with mutations 
in the 0-amyloid precursor protein {APP) gene (Chartier-Harlin et al., 1991; Goate 
et al., 1991; Murrell et al., 1991; Karlinsky et al., 1992; Mullan et al., 1992). A 

10 third locus (AD3) associated with a larger proportion of cases with early onset AD 
has recently been mapped to chromosome 14q24.3 (Schellenberg et al., 1992; St 
George-Hyslop et al., 1992; Van Broeckhoven et al., 1992). 

Although chromosome 14q carries several genes which could be regarded as 
candidate genes for the site of mutations associated with AD3 (e.g. cFOS , alpha-1- 

15 antichymotrypsin, and cathepsin G), most of these candidate genes have been 
excluded on the basis of their physical location outside the AD3 region and/or the 
absence of mutations in their respective open reading frames (Schellenberg, GD et 
al., 1992; Van Broeckhoven, C et al., 1992; Rogaev et al., 1993; Wong et al., 
1993). 

20 There have been several developments and commercial directions in respect 

of treatment of Alzheimer's Disease and diagnosis thereof. Published PCT 
application WO 94 23049 describes transfection of high molecular weight YAC DNA 
into specific mouse cells. This method is used to analyze large gene complexes, for 
example the transgenic mice may have increased amyloid precursor protein gene 

25 dosage, which mimics the trisomic condition that prevails in Downs Syndrome and 
the generation of animal models with ^-amyloidosis prevalent in individuals with 
Alzheimer's Disease. Published international application WO 94 00569 describes 
transgenic non-human animals harbouring large trans genes such as the trans gene 
comprising a human amyloid precursor protein gene. Such animal models can 

30 provide useful models of human genetic diseases such as Alzheimer's Disease. 
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Canadian Patent application 2096911 describes a nucleic acid coding for 
amyloid precursor protein-cleaving protease, which is associated with Alzheimer's 
Disease and Down's syndrome, The genetic information may be used to diagnose 
Alzheimer's disease. The genetic information was isolated from chromosome 19. 

5 Canadian patent application 2071 105, describes detection and treatment of inherited 
or acquired Alzheimer's disease by the use of YAC nucleotide sequences. The YACs 
are identified by the numbers 23CB10, 28CA12 and 26FF3. 

U.S. Patent 5297562, describes detection of Alzheimer's Disease having two 
or more copies of chromosome 21. Treatment involves methods for reducing the 

10 proliferation of chromosome 21 trisomy. Canadian Patent application 2054302, 
describes monoclonal antibodies which recognize human brain cell nucleus protein 
encoded by chromosome 21 and are used to detect changes or expression due to 
Alzheimer's Disease or Down's Syndrome. The monoclonal antibody is specific to 
a protein encoded by human chromosome 21 and is linked to large pyramidal cells 

15 of human brain tissue. 

By extensive effort and a unique approach to investigating the AD3 region of 
chromosome 14q, the Alzheimer's related membrane protein (AKMP) gene has been 
isolated, cloned and sequenced from within the AD3 region on chromosome 14q24.3. 
In addition, direct sequencing of RT-PCR products spanning this 3.0 kb cDNA 

20 transcript isolated from affected members of at least eight large pedigrees linked to 
chromosome 14, has led to the discovery of missense mutations in each of these 
different pedigrees. These mutations are absent in normal chromosomes. It has now 
been established that the AKMP gene is causative of familial Alzheimer's Disease 
type AD3. In realizing this link, it is understood that mutations in this gene can be 

25 associated with other cognitive, intellectual, or psychological diseases such as cerebral 
hemorrhage, schizophrenia, depression, mental retardation and. epilepsy. These 
phenotypes are present in these AD families and these phenotypes have been seen in 
mutations of the APP protein gene. The Amyloid Precursor Protein (APP) gene is 
also associated with inherited Alzheimer's Disease. The identification of both normal 

30 and mutant forms of the AKMP gene and gene products has allowed for the 
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development of screening and diagnostic tests for ARMP utilizing nucleic acid probes 
and antibodies to the gene product. Through interaction with the defective gene 
product and the pathway in which this gene product is involved, gene therapy, 
manipulation and delivery are now made possible. 

STTMMAPV OF THE INVENTION 

Various aspects of the invention are summarized as follows. In accordance 
with a first aspect of the invention, a purified mammalian polynucleotide is provided 
which codes for Alzheimer's related membrane protein (ARMP). The polynucleotide 
has a sequence which is the functional equivalent of the DNA sequence of ATCC 

deposit deposited April 28, 1995. The mammalian polynucleotide may be 

in the form of DNA, genomic DNA, cDNA, mRNA and various fragments and 
portions of the gene sequence encoding ARMP. The mammalian DNA is conserved 
in many species, including humans and rodents, example mice. The mouse sequence 
encoding ARMP has greater than 9596 homology with the human sequence encoding 
the same protein. 

Purified human nucleotide sequences which encode mutant ARMP have 
mutations at nucleotide position i) 685, A-C ii) 737, A-G iii) 986, OA, iv) 
1105, C-»G, v) 1478, G-A, vi) 1027, OT, vii) 1102, OT and viii) 1422, OG 

of Sequence ID No: 1 as well as in the cDNA sequence of a further human clone of 

W 1 ^ - • 

a sequence identified by ID NO: 132. 

The nucleotide sequences encoding ARMP have an alternative splice form in 
the genes open reading frame. The human cDNA sequence which codes for ARMP 
has sequence ID No. 1 as well as sequence ID NO: 132. as sequenced in a another 
human clone. The mouse sequence which encodes ARMP has sequence ID No. 3, as 
well as SEQ ID NO:134lderived from a further clone containing the entire coding 
region. Various DNA and RNA probes and primers may be made from appropriate 
polynucleotide lengths selected from the sequences. Portions of the sequence also 
encode antigenic determinants of the ARMP. 

Suitable expression vectors comprising the nucleotide sequences are provided 




along with suitable host cells transfected with such expression vectors. 

In accordance with another aspect of the invention, purified mammalian 
Alzheimer's related membrane protein is provided. The purified protein has an amino 
acid sequence encoded by polynucleotide sequence as identified above which for the 

5 human is sequence ID NO:2 and SEQ ED NO: 133 (derived from another clone). 
The mouse amino acid sequence is defined by sequence ED No. 2 and sequence ID 
No. 4, the later being translated from another clone containing the entire coding 
region. The purified protein may have substitution mutations selected from the group 
consisting of positions identified in Sequence ID No: 2 and Sequence ID NO: 133. 

10 i) M 146L 

ii) H 163R 

iii) A 246E 

iv) L 286V 

v) C 410 Y 
15 vi) A 260 V 

vii) A 285 V 

viii) L 392 V 

In accordance with another aspect of the invention, are polyclonal antibodies 
raised to specific predicted sequences of the ARMP protein. Polypeptides of at least 
20 six amino acid residues are provided. The polypeptides of six or greater amino acid 
residues may define antigenic epitopes of the ARMP. Monoclonal antibodies having 
suitably specific binding affinity for the antigenic regions of the ARMP are prepared 
by use of corresponding hybridoma cell lines. In addition, other polyclonal antibodies 
may be prepared by inoculation of animals with suitable peptides or holoprotein which 
25 add suitable specific binding affinities for antigenic regions of the ARMP. 

In accordance with another aspect of the invention, an isolated DNA molecule 
is provided which codes for E5-1 protein. 

In accordance with another aspect of the invention, purified E5-1 protein is 
provided, having amino acid Sequence ID No: 137. 
30 In accordance with another aspect of the invention a bioassay is provided for 
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determining if a subject has a normal or mutant ARMP, where the bioassay comprises 
providing a biological sample from the subject 

conducting a biological assay on the sample to detect a normal or mutant gene 
sequence coding for ARMP, a normal or mutant ARMP amino acid sequence, or a 

5 normal or defective protein function. 

In accordance with another aspect of the invention, a process is provided for 
producing ARMP comprising culturing one of the above described transfected host 
cells under suitable conditions, to produce the ARMP by expressing the DNA 
sequence. Alternatively, ARMP may be isolated from mammalian cells in which the 

10 ARMP is normally expressed. 

In accordance with another aspect of the invention, is a therapeutic 
composition comprising ARMP and a pharmaceutical^ acceptable carrier. 

In accordance with another aspect of the invention, a recombinant vector for 
transforming a mammalian tissue cell to express therapeutically effective amounts of 

15 ARMP in the cells is provided. The vector is normally delivered to the cells by a 
suitable vehicle. Suitable vehicles include vaccinia virus, adenovirus, adeno 
associated virus, retrovirus, liposome transport, neuraltropic viruses, Herpes simplex 
virus and other vector systems. 

In accordance with another aspect of the invention, a method of treating a 

20 patient deficient in normal ARMP comprising administering to the patient a 
therapeutically effective amount of the protein targeted at a variety of patient cells 
which normally express ARMP. The extent of administration of normal ARMP being 
sufficient to override any effect the presence of the mutant ARMP may have on the 
patient. As an alternative to protein, suitable ligands and therapeutic agents such as 

25 small molecules and other drug agents may be suitable for drug therapy designed to 
replace the protein and defective ARMP, displace mutant ARMP, or to suppress its 
formation. 

In accordance with another aspect of the invention an immuno therapy for 
treating a patient having Alzheimer's Disease comprises treating the patient with 
30 antibodies specific to the mutant ARMP to reduce biological levels or activity of the 



6 




mutant ARMP in the patient. To facilitate such amino acid therapy, a vaccine 
composition may be provided for evoking an immune response in a patient of 
Alzheimer's Disease where the composition comprises a mutant ARMP and a 
pharmaceutical^ acceptable carrier with or without a suitable excipient. The 

5 antibodies developed specific to the mutant ARMP could be used to target 
appropriately encapsulated drugs/molecules, specific cellular/tissue sites. Therapies 
utilizing specific ligands which bind to normal or wild type ARMP of either mutant 
or wild type and which augments normal function of ARMP in membranes and/or 
cells or inhibits the deleterious effect of the mutant protein are also made possible, 

10 In accordance with another aspect of the invention, a transgenic animal model 

for Alzheimer's Disease which has the mammalian polynucleotide sequence with at 
least one mutation which when expressed results in mutant ARMP in the animal cells 
and thereby manifests a phenotype. For example, the human Prion gene when over- 
expressed in rodent peripheral nervous system and muscle cells causes a quite 

15 different response in the animal than the human. The animal may be a rodent and 
is preferably a mouse, but may also be other animals including rat, pig, Irosophila 
melanogaster, C. elegans (nematode), all of which are used for transgenic models. 
Yeast cells can also be used in which the ARMP Sequence is expressed from an 
artificial vector. 

20 In accordance with another aspect of the invention a transgenic mouse model 

for Alzheimer's Disease has the mouse gene encoding ARMP human or murine 
homologues mutated to manifest the symptoms. The transgenic mouse may exhibit 
symptoms of cognitive memory or behavioural disturbances. In addition or 
alternatively, the symptoms may appear as another cellular tissue disorders such as 

25 in mouse liver, kidney spleen or bone marrow and other organs in which the ARMP 
gene product is normally expressed. 

In accordance with another aspect of the invention, the protein can be used as 
a starting point for rationale drug design to provide ligands, therapeutic drugs or other 
types of small chemical molecules. 

30 
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BRIEF DESCRI PTION OP THE DRAWINGS 

Various aspects of the invention are described hereinafter with respect to the 
drawings wherein: 

Figure la. Genomic physical and transcriptional map of the AD3 region of 
chromosome 14. Genetic map inter-marker genetic distances averaged for male and 
female meiosis are indicated in centiMorgans. 

Figure lb. Is the constructed physical contig map of overlapping genomic DNA 
fragments cloned into YACs spanning a FAD locus on chromosome 14q. 

Figure lc. Regions of interest within the constructed physical contig map. 

Figure Id. Transcriptional map illustrating physical locations of the 19 independent 
longer cDNA clones. 

Figure 2. Automated fluorescent chromatograms representing the ohange in 
nucleic acids which direct (by the codon) the amino acid sequence of the gene. 

(a) Met 146 Leu 

(b) His 163 Arg 

(c) Ala 246 Glu 

(d) Leu286Val 

(e) Cys 410 Tyr 

Figure 3(a). Restriction fragments of M 146 L mutation using BsphI restriction 
enzyme in AD patients. Absence of a restriction site indicates a mutant allele. 

Figure 3(b). Presence of the His 163 Arg mutation detected by Nlam restriction 
digestion. Absence of a restriction indicates a mutant allele. 

8 




Figure 3(c). Presence of the Ala 246 Glu mutation in AD 

patients using Ddel restriction enzyme. Presence of mutant allele leads to restriction. 

Figure 3(d). Presence of Cys 410 Tyr mutation in AD patients as assayed using 
5 allelle specific oligonucleotides. 

Figure 3(e). Presence of Leu286Val mutation in AD patients using PvuII restriction 
enzyme in AD patients. 

10 Figure 4. RNA blot demonstrating the expression of ARMP protein mRNA in 
different regions of the brain including amygdala, caudate, corpus callosum, 
hippocampus, hypothalamus, substantia nigra, subthalamic nucleus and thalamus. 

Figure 5. RNA blot demonstrating the expression of ARMP protein mRNA in a 
15 variety of tissues including heart, brain, placenta, lung, liver, skeletal muscle, kidney 
and pancreas. 

Figure 6a. Hydropathy plot of the putative ARMP protein. 

20 Figure 6b. A model for the structural organization of the putative ARMP protein. 
Roman numerals depict the transmembrane domains. Putative glycosylation sites are 
indicated as asterisks and most of the phosphorylation sites are located on the same 
membrane face as the two acidic hydrophillic loops. Hie MAP kinase site is present 
at residue 115 and the PKC site at residue 114. FAD mutation sites are indicated by 

25 horizontal arrows. 

Figure 7 shows transcription of the E5-1 gene, investigated by hybridization of the 
E5-1 cDNA to Northern blots of mRNA from multiple human brain regions (Panel 
A), and several peripheral tissues (Panel Q. In brain, the ES-1 transcript is of a 
30 lower molecular weight and lesser abundance that the ARMP transcript (Panel B) 
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hybridized to the same blot using identical conditions. 

Figure 8 shows the predicted structure of the E5-1 protein. 

5 DETAILED DESCRTPTTON OF PREFERRED EMBODIMENTS 

In order to facilitate review of the various embodiments of the invention and 
an understanding of various elements and constituents used in making the invention 
and using same, the following definition of terms used in the invention description is 
as follows: 

10 Alzheimer Related Membrane Protein gene (AEMP gene) - the chromosome 14 
gene which when mutated is associated with familial Alzheimer's Disease and/or 
other inheritable disease phenotypes (eg. cerebral hemorrhage, mental retardation, 
schizophrenia, psychosis, and depression). This definition is understood to include 
the various sequence polymorphisms that exist, wherein nucleotide substitutions in the 

15 gene sequence do not affect the essential function of the gene product, as well as 
functional equivalents of the nucleotide sequences of Sequence ID No. 1, Sequence 
ID NO: 132, Sequence ID No: 3 and Sequence ID NO: 134. This term primarily 
relates to an isolated coding sequence, but can also include some or all of the flanking 
regulatory elements and/or introns. The term ARMP gene includes the gene in other 

20 species analogous to the human gene which when mutated is associated with 
Alzheimer's disease. 

Alzheimer Related Membrane Protein (ARMP) - the protein encoded by the ARMP 
gene. The preferred source of protein is the mammalian protein as isolated from 
humans or animals. Alternatively, functionally equivalent proteins may exist in 

25 plants, insects and invertebrates (such as G elegans). The protein may be produced 
by recombinant organisms, or chemically or enzymatically synthesized. This 
definition is understood to include functional variants such as the various polymorphic 
forms of the protein wherein amino acid substitutions or deletions within the amino 
acid sequence do not affect the essential functioning of the protein, or its structure. 

30 It also includes functional fragments of ARMP. 
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Mutant ARMP gene - The ARMP gene containing one or more mutations which lead 
to Alzheimer's Disease and/or other inheritable disease phenotypes (eg. cerebral 
hemorrhage, mental retardation, schizophrenia, psychosis, and depression). This 
definition is understood to include the various mutations that exist, wherein nucleotide 

5 substitutions in the gene sequence affect the essential function of the gene product, 
as well as mutations of functional equivalents of the nucleotide sequences of Sequence 
ID No. 1, Sequence ID NO;132, Sequence ID No:3, and ID NO: 134 (the 
corresponding amino acid sequences). This term primarily relates to an isolated 
coding sequence, but can also include some or all of the flanking regulatory elements 

10 and/or introns. 

Mutant ARMP - a mammalian protein that is highly analogous to ARMP in terms 
of primary structure, but wherein one or more amino acid deletions and/or 
substitutions result in impairment of its essential function, so that mammals, 
especially humans, whose ARMP producing cells express mutant ARMP rather than 

15 the normal ARMP, demonstrate the symptoms of Alzheimer's Disease and/or other 
relevant inheritable phenotypes (eg. cerebral hemorrhage, mental retardation, 
schizophrenia, psychosis, and depression). 

mASMP gene - mouse gene analogous to the human ARMP gene. Functional 
equivalent as used in describing gene sequences and amino acid sequences means that 

20 a recited sequence need not be identical to the definitive sequence of the Sequence ID 
Nos but need only provide a sequence which functions biologically and/or chemically 
the equivalent of the definitive sequence. Hence sequences which correspond to a 
definitive sequence may also be considered as functionally equivalent sequence, 
mARMP - mouse Alzheimer related membrane protein, analogous to the human 

25 ARMP, encoded by the mARMP gene. This definition is understood to include the 
various polymorphic forms of the protein wherein amino acid substitutions or 
deletions of the sequence does not affect the essential functioning of the protein, or 
its structure. 

Mutant mARMP - a mouse protein which is highly analogous to mARMP in terms 
30 of primary structure, but wherein one or more amino acid deletions and/or 
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substitutions result in impairment of its essential function, so that mice, whose 
mARMP producing cells express mutant mARMP rather than the normal mARMP 
demonstrate the symptoms of Alzheimer's Disease and/or other relevant inheritable 
phenotypes, or other phenotypes and behaviours as manifested in mice, 
5 ARMP carrier - a mammal in apparent good health whose chromosomes contain a 
mutant ARMP gene that may be transmitted to the offspring and who will develop 
Alzheimer's Disease in mid to late adult life. 

Missense mutation - A mutation of nucleic acid sequence which alters a codon to that 
of another amino acid, causing an altered translation product to be made. 

10 Pedigree - In human genetics, a diagram showing the ancestral relationships and 
transmission of genetic traits over several generations in a family. 
E5-1 gene - the chromosome 1 gene which shows homology to the ARMP gene and 
which when mutated is associated with familial Alzheimer's disease and/or other 
inheritable disease phenotypes. This definition is understood to include the various 

15 sequence polymorphisms that exist, wherein nucleotide substitutions in the gene 
sequence do not affect the essential function of the gene product, as well as functional 
equivalents of the nucleotide Sequence ID No: 136. This term also includes the gene 
in other species analogous to the human gene described herein. 
ES-1 protein - the protein encoded by the £5-2 gene. This term includes the protein 

20 of Sequence ID No: 137 and also functional variants such as the various polymorphic 
and splice variant forms of the protein wherein amino acid substitutions' or deletions 
within the amino acid sequence do not affect the essential functioning of the protein. 
The term also includes functional fragments of the protein. 
Mutant 225-1 gene - the E5-1 gene containing one or more mutations which lead to 

25 Alzheimer's Disease. This term is understood to include the various mutations that 
exist, wherein nucleotide substitutions in the gene sequence affect the essential 
function of the gene product. 

Mutant E5-1 protein - a protein analogous to E5-1 protein but wherein one or more 
amino add deletions and/or substitutions result in impairment of its essential function 
30 such that mammals, especially humans, whose E5-l-producing cells express mutant 




E5-1 protein demonstrate the symptoms of Alzheimer's Disease. 

Linkage analysis- Analysis of co-segiegation of a disease trait or disease gene with 

polymorphic genetic markers of defined chromosomal location. 

hARMP gene - human ARMP gene 

5 ORF - open reading frame. 

PCR - polymerase chain reaction. 

contig - continuous cloned regions 

YAC - yeast artificial chromosome 

RT-PCR - reverse transcription polymerase chain reaction. 

10 SSR - Simple sequence repeat polymorphism. 

The present invention is concerned with the identification and sequencing of 
the mammalian ARMP gene in order to gain insight into the cause and etiology of 
familial Alzheimer's Disease. From this information, screening methods and 
therapies for the diagnosis and treatment of the disease can be developed. The gene 

15 has been identified, cDNA isolated and cloned, and its transcripts and gene products 
identified and sequenced. During such identification of the gene, considerable 
sequence information has also been developed on intron information in the ARMP 
gene, flanking untranslated information and signal information and information 
involving neighbouring genes in the AD3 chromosome region. Direct sequencing of 

20 overlapping RT-PCR products spanning the human gene isolated from affected 
members of large pedigrees linked to chromosome 14 has led to the discovery of 
missense mutations which co-segregate with the disease. 

Although it is generally understood that Alzheimer's Disease is a neurological 
disorder, most likely in the brain, expression of ARMP has been found in varieties 

25 of human tissue such as heart, brain, placenta, lung, liver, skeletal muscle, kidney 
and pancreas. Although this gene is expressed widely, the clinically apparent 
phenotype exists in brain although it is conceivable that biochemical phenotypes may 
exist in these other tissues. As with other genetic diseases such as Huntington's 
Disease and APP - Alzheimer's, the clinical disease manifestation may reflect 

30 different biochemistries of different cell types and tissues ( which stem from genetics 
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and the protein). Such findings suggest that AD may not be solely a neurological 
disorder but may also be a systemic disorder, hence requiring alternative therapeutic 
strategies which may be targeted to other tissues or organs or generally in addition 
or separately from neuronal or brain tissues. 

5 The ARMP mutations identified have been related to Alzheimer disease 

pathology. With the identification of sequencing of the gene and the gene product, 
probes and antibodies raised to the gene product can be used in a variety of 
hybridization and immunological assays to screen for and detect the presence of either 
a normal or mutated gene or gene product. 

10 Patient therapy through removal or blocking of the mutant gene product, as 

well as supplementation with the normal gene product by amplification, by genetic 
and recombinant techniques or by immunotherapy can now be achieved. Collection 
or modification of the defective gene product by protein treatment immunotherapy 
(using antibodies to the defective protein) or knock-out of the mutated gene is now 

15 also possible. Familial Alzheimer's Disease could also be controlled by gene therapy 
in which the gene defect is corrected in situ or by the use of recombinant or other 
vehicles to deliver a DNA sequence capable of expressing the normal gene product, 
or a deliberately mutated version of the gene product whose effect counter balances 
the deleterious consequences of the disease mutation to the affected cells of the 

20 patient. 

The present invention is also concerned with the identification and sequencing 
of a second gene, the £5-2 gene on chromosome 1, which is associated with familial 
Alzheimer's Disease. 

Disease mechanism insights and therapies analogous to those described above 
25 in relation to the ARMP gene will be available as a result of the identification and 
isolation of the gene. 

Isolating the Human ARMP Gene 
Genetic mapping of the AD3 locus. 
30 After the initial regional mapping of the AD3 gene locus to 14q24.3 near the 
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anonymous microsatellite markers D14S43 and D14S53 (ScheUenberg, GD et al., 
1992; St George-Hyslop, P ct al., 1992; Van Broeckhoven, C et al., 1992), twenty 
one pedigrees were used to segregate AD as a putative autosomal dominant trait (St 
George-Hyslop, P et al., 1992) and to investigate the segregation of 18 additional 
5 genetic markers from the 14q24.3 region which had been organized into a high 
density genetic linkage map (Figure lb) (Weissenbach et al., 1992; Gyapay et al., 
1994). Pairwise maximum likelihood analyses previously published confirmed 
substantial cumulative evidence for linkage between FAD and all of these markers 
(Table 1). However, much of the genetic data supporting linkage to these markers 
10 were derived from six large early onset pedigrees FAD1 (Nee et aL, 1983), FAD2 
(Frommelt etal., 1991), FAD3 (Goudsmit etaL, 1981; Pollen, 1993), FAD4 (Foncin 
et aL, 1985), TOR1.1 (Bergamini, 1991) and 603 (Pericak-Vance et al M 1988) each 
of which provide at least one anonymous genetic marker from 14q24.3 (St. George- 
Hyslop, P. et al 1992). 

15 In order to more precisely define the location of the AD3 gene relative to the 

known locations of the genetic markers from 14q24.3, recombinational landmarks 
were sought by direct inspection of the raw haplotype data only from genotyped 
affected members of the six pedigrees showing definitive linkage to chromosome 14. 
This selective strategy in this particular instance necessarily discards data from the 

20 reconstructed genotypes of deceased affected members as well as from elderly 
asymptomatic members of the large pedigrees, and takes no account of the smaller 
pedigrees of uncertain linkage status. However, this strategy is very sound because 
it also avoids the acquisition of potentially misleading genotype data acquired either 
through errors in the reconstructed genotypes of deceased affected members arising 

25 from non-paternity or sampling errors or from the inclusion of unlinked pedigrees. 

Upon inspection of the haplotype data for affected subjects, members of the 
six large pedigrees whose genotypes were directly determined revealed obligate 
recombinants at D14S4S and D14S53, and at D14S258 and D14S63. The single 
recombinant at D14S53, which depicts a tdomeric boundary for the FAD region, 

30 occurred in the same AD affected subject of the FAD1 pedigree who had previously 
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been found to be recombinant at several other markers located teloraeric to D14S53 
including D14S48 (St George-Hyslop, P et al., 1992). Conversely, the single 
recombinant at D14S258, which marks a centromeric boundary of the FAD region, 
occurred in an affected member of the FAD3 pedigree who was also recombinant at 

5 several other markers centromeric to D14S258 including D14S63. Both recombinant 
subjects had unequivocal evidence of Alzheimer's disease confirmed through standard 
clinical tests for the illness in other affected members of their families, and the 
genotype of both recombinant subjects was informative and co-segregating at multiple 
loci within the interval centromeric to D14S53 and telomeric to D14S258. 

10 When the haplotype analyses were enlarged to include the reconstructed 

genotypes of deceased affected members of the six large pedigrees as well as data 
from the remaining fifteen pedigrees with probabilities for linkage of less than 0,95, 
several additional recombinants were detected at one or more marker loci within the 
interval between D14S53 and D14S258. Thus, one additional recombinant was 

15 detected in the reconstructed genotype of a deceased affected member of each of three 
of the larger FAD pedigrees (FAD1, FAD2 and other related families), and eight 
additional recombinants were detected in affected members of five smaller FAD 
pedigrees. However, while some of these recombinants might have correctly placed 
the AD3 gene within a more defined target region, we were forced to regarded these 

20 potentially closer "internal recombinants" as unreliable not only for the reasons 
discussed earlier, but also because they provided mutually inconsistent locations for 
the AD3 gene within the D14S53-D14S258 interval. 

Construction of a physical contig spanning the AD3 region. 

25 As an initial step toward cloning the AD3 gene a contig of overlapping 

genomic DNA fragments cloned into yeast artificial chromosome vectors, phage 
artificial chromosome vectors and cosmid vectors was constructed (Figure lb), FISH 
mapping studies using cosmids derived from the YAC clones 932c7 and 964f5 
suggested that the interval most likely to carry the AD3 gene was at least five 

30 megabases in size. Because the large size of this minimal co-segregating region 



16 




would make positional cloning strategics intractable, additional genetic pointers were 
sought which focused the search for the AD3 gene to one or more subregions within 
the interval flanked by D14S53 and D14S258. Haplotype analyses at the markers 
between D14S53 and D14S258 failed to detect statistically significant evidence for 
5 linkage disequilibrium and/or allelic association between the FAD trait and alleles at 
any of these markers, irrespective of whether the analyses were restricted to those 
pedigrees with early onset forms of FAD, or were generalized to include all 
pedigrees. This result was not unexpected given the diverse ethnic origins of our 
pedigrees. However, when pedigrees of similar ethnic descent were collated, direct 

10 inspection of the haplotypes observed on the disease bearing chromosome segregating 
in different pedigrees of similar ethnic origin revealed two clusters of marker loci 
(Table 2). The first of these clusters located centromeric to D14S77 (D14S786, 
D14S277 and D14S268) and spanned the 0.95 Mb physical interval contained in 
YAC 78842 (depicted as region B in figure lc). The second cluster was located 

15 telomeric to D14S77 (P14S43 , D14S273 , and D14S7S) and spanned the - 1Mb 
physical interval included within the overlapping YAC clones 964c2, 74163, 797dll 
and part of 854£5 (depicted as region A in figure lc). Identical alleles were observed 
in at least two pedigrees from the same ethnic origin (Table 2). As part the strategy, 
it was reasoned that the presence of shared alleles at one of these groups of physically 

20 clustered marker loci might reflect the co-inheritance of a small physical region 
surrounding the ARMP gene on the original founder chromosome in each ethnic 
population. Significantly, each of the shared extended haplotypes were rare in normal 
Caucasian populations and allele sharing was not observed at other groups of markers 
spanning similar genetic intervals elsewhere on chromosome 14q24.3. 

25 

Transcription mapping and preliminary analysis of candidate genes 

To isolate expressed sequences encoded within both critical intervals, a direct 
selection strategy was used involving immobilized, cloned, human genomic DNA as 
the hybridization target to recover transcribed sequences from primary complementary 
30 DNA pools derived from human brain mRNA (Rommens ct al., 1993). 
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Approximately 900 putative cDNA fragments of size 100 to 600 base pairs were 
recovered from regions A and B in figure lc. These fragments were hybridized to 
Southern blots containing genomic DNAs from each of the overlapping YAC clones 
and genomic DNAs from humans and other mammals. This identified a subset of 

5 151 clones which showed evidence for evolutionary conservation and/or for a 
complex structure which suggested that they were derived from spliced mRNA. The 
clones within this subset were collated on the basis of physical map location, cross- 
hybridization and nucleotide sequence, and were used to screen conventional human 
brain cDNA libraries for longer cDNAs. At least 19 independent cDNA clones over 

10 1 kb in length were isolated and then aligned into a partial transcription map of the 
AD3 region (Figure Id). Only three of these transcripts corresponded to known 
characterized genes {cFOS, dihydrolipoamide succinyl transferase and latent 
transforming growth factor binding protein 2). 

Prrvwrrv of Poten tial Candidate Genes 

Each of the open reading frame portions of the candidate genes were recovered 
by RT-PCR from mRNA isolated from post-mortem brain tissue of normal control 
subjects and from either post-mortem brain tissue or cultured fibroblast cell lines of 
affected members of six pedigrees definitively linked to chromosome 14. The RT- 
PCR products were then screened for mutations using chemical cleavage and 
restriction endonuclease fingerprinting single-strand sequence conformational 
polymorphism methods (Saleeba and Cotton, 1993; Liu and Sommer, 1995), and by 
direct nucleotide sequencing. With one exception, all of the genes examined, 
although of interest, were not unique to affected subjects, and did not co-segregate 
with the disease. The single exception was the candidate gene represented by clone 
S182 which contained a series of nucleotide changes not observed in normal subjects, 
but which altered the predicted amino acid sequence in affected subjects. Although 
nucleotide sequence differences were also observed in some of the other genes, most 
were in the 3' untranslated regions and none were unique to AD-affected subjects. 
The remaining sequences, a subset of which are mapped in Figure lb together 
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with additional putative transcriptional sequences not identified in Figure lc, are 
identified in the sequence listings as 14 through 43. The Sequence ID Nos: 14 to 43 
represent neighbouring genes or fragments of neighbouring genes adjacent the 
hARMP gene or possibly additional coding fragments arising from alternative splicing 
of the hARMP. Sequence ID Nos: 44-125, and 149-159 represent neighboring 
genomic fragments containing both exon and intron information. Such sequences are 
useful for creating primers, for creating diagnostic tests, creating altered regulatory 
sequences and use of adjacent genomic sequences to create better animal models. 

Char ^teriTfl tion of the hARMP gene 

Hybridization of the S182 clone to northern blots identified a transcript 
expressed widely in many areas of brain and peripheral tissues as a major 3.0 kb 
transcript and a minor transcript of 7.0 kb (Figures 4 and 5). Although the identity 
of the - 7.0 kb transcript is unclear, two observations suggest that the ~ 3.0 kb 
transcript represents an active product of the gene. Hybridization of the S182 clone 
to northern blots containing mRNA from a variety of murine tissues, including brain, 
identifies only a single transcript identical in size to the ~ 3.0 kb human transcript. 
All of the longer cDNA clones recovered to date (2.6-2.8 kb), which include both 5* 
and 3' UTRs and which account for the - 3,0 kb band on the northern blot, have 
mapped exclusively to the same physical region of chromosome 14. From these 
experiments the - 7,0 kb transcript could represent either a rare alternately spliced 
or poiyadenylated isoform of the - 3.0 kb transcript or could represent another gene 

with homology to S182. 

The nucleotide sequence of the major transcript was determined from the 
consensus of eleven independent longer cDNA clones and from 3 independent clones 
recovered by standard 5' rapid amplification of cDNA ends and bears no significant 
homology to other human genes. The cDNA of the sequenced transcript is provided 
in Sequence ID No: 1 and the predicted amino acid sequence is provided in Sequence 
ID No: 2. The cDNA sequence of another sequenced human clone is also provided 
as Sequence ID NO: 132 and its predicted amino acid sequence is provided in SEQ 
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ID NO: 133. 

Analysis of the 5' end of multiple cDNA clones and RT-PCR products as well 
as corresponding genomic clones indicates that the 5' UTR is contained within at least 
two exons and that transcription either begins from two different start sites and/or that 
5 one of the early 5' untranslated exons is alternatively spliced (Table 6). The longest 
predicted open reading frame contains 467 amino acids with a small alternatively 
spliced exon of 4 amino acids at 25 codons from the putative start codon (Table 3). 
This putative start codon is the first in phase ATG located 63 bp downstream of a 
TGA stop codon and lacks a classical Kozak consensus sequences around the first two 
10 in-phase ATG sequences (Rogaer et al t in preparation). Like other genes lacking 
classical 'strong' start codons, the putative 5 1 UTR of the human transcripts are rich 
inGC. 

Comparison of the nucleic acid and predicted amino acid sequences with 
available databases using the BLAST alignment paradigms revealed modest amino 

15 acid similarity with the C elegans sperm integral membrane protein SPE-4 (p = 1.5e" * 
24-37% identity over three groups of at least fifty residues) and weaker similarity 
to portions of several other membrane spanning proteins including mammalian 
chromogxanin A and alpha subunit of mammalian voltage dependent calcium channels 
(Altschul et al, , 1990). This clearly established that they are not the same gene. The 

20 amino-acid sequence similarities across putative transmembrane domains may 
occasionally yield alignment that simply arises from the limited number of 
hydrophobic amino acids, but there is also extended sequence alignment between S 182 
protein and SPE-4 at several hydxophillic domains. Both the putative S182 protein 
and SPE-4 are predicted to be of comparable size (467 and 465 residues, respectively) 

25 and to contain at least seven transmembrane domains with a large acidic domain 
preceding the final predicted transmembrane domain, The SI 82 protein does have 
a longer predicted hydrophillic region at the N terminus- 
Further investigation of the hARMP has revealed a host of sequence fragments 
which form the hARMP gene and include intron sequence information, S f end 

30 untranslated sequence information and 3 ' end untranslated sequence information 
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(Table 6). Such sequence fragments are identified in Sequence ID Nos. 6 to 13. 

Mutations in the S182 transcript 

Direct sequencing of overlapping RT-PCR products spanning the 3.0 Kb S182 
5 . transcript isolated from affected members of the six large pedigrees linked to 
chromosome 14 led to the discovery of eight missense mutations in each of the six 
pedigrees (Tkble 7, Figure 2). Each of these mutations co-segregated with the disease 
in the respective pedigrees [Figures 3(a)(b)(c)(d)(e)], and were absent from 142 
unrelated neurologically normal subjects drawn from the same ethnic origins as the 

10 FAD pedigrees (284 unrelated chromosomes). 

The location of the gene within the physical interval segregating with AD3 
trait, the presence of eight different missense mutations which co-segregate with the 
disease trait in six pedigrees definitively linked to chromosome 14, and the absence 
of these mutations in 284 independent normal chromosomes cumulatively confirms 

15 that the hARMP gene is the AD3 locus. Further biologic support for this hypothesis 
arises both from the fact that the residues mutated in FAD kindreds are conserved in 
evolution (Table 3) and occur in domains of the protein which are also highly 
conserved, and from the fact that the S182 gene product is expressed at high levels 
in most regions of the brain including the most severely affected with AD, 

20 The DNA sequence for the hARMP gene as cloned has been incorporated into 

a plasraid Bluescript. This stable vector has been deposited at ATCC under accession 

number on April 28, 1995L 

Several mutations in the hARMP gene have been identified which cause a 
severe type of familial Alzheimer's Disease- One, or a combination of these 

25 mutations may be responsible for this form of Alzheimer's Disease as well as several 
other neurological disorders. The mutations may be any form of nucleotide sequence 
alteration or substitution. Specific disease causing mutations in the form of nucleotide 
and/or amino acid substitutions have been located, although we anticipate additional 
mutations will be found in other families. Each of these nucleotide substitutions 

30 occurred within the putative ORF of the S182 transcript, and would be predicted to 
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change the encoded amino acid at the following positions, numbering from the first 
putative initiation codon. The mutations are listed in respect of their nucleotide 
locations in Sequence ID No: 1 and Sequence ID NO: 132 (an additional human 
clone) and amino acid locations in Sequence ID No: 2 and Sequence ID NO:135 ( the 



additional human clone). 








i) 685, A-C 


Met' 


146 


Leu 


ii) 737, A-G 


His 


163 


Arg 


iii) 986, OA 


Ala 


246 


Glu 


iv) 1105, OG 


Leu 


286 


Val 


v) 1478, G-A 


Cys 


410 


Tyr 


vi) 1027, OT 


Ala 


260 


Val 


vii) 1102, C-*T 


Ala 


285 


Val 


viii) 1422, C-*G 


Leu 


392 


Val 



The Metl46Leu, Ala246Glu and Cys410Tyr mutations have not been detected 

15 in the genomic DNA of affected members of the eight remaining small early onset 
autosomal dominant FAD pedigrees or six additional families in our collection which 
express late FAD onset. We predict that such mutations would not commonly occur 
in late onset FAD which has been excluded by genetic linkage studies from the more 
aggressive form of AD linked to chromosome 14q24.3 (St George-Hyslop, P et al., 

20 1992; Schellenberg et al., 1993). The Hisl63Arg mutation has been found in the 
genomic DNA of affected members of one additional FAD pedigree for which 
positive but significant statistical evidence for linkage to 14 becomes established. Age 
of onset of affected members was consistent with affected individuals from families 
linked to chromosome 14. 

25 Mutations Ala260Val, Ala285Val, and Leu392Val all occur within the acidic 

hydrophilic loop between putative transmembrane 6 (TM6) and transmembrane (TM7) 
(Figure 6). Two of the mutations (A260V; A285V) and the L286V mutation are also 
located in the alternative spliced domain. 

All eight of the mutations can be assayed by a variety of strategies (direct 

30 nucleotide sequencing, allele specific oligos, ligation polymerase chain reaction, 
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SSCP, RFLPs etc.) using RT-PCR products representing the mature mRNA/cDNA 
sequence or genomic DNA. Allele specific oligos were chosen for assaying the 
mutations. For the A260V and the A285V mutations, genomic DNA carrying the 
exon was amplified using the same PCR primers and methods as for the L286V 
5 mutation. PCR products were then denatured and slot blotted to duplicate nylon 
membranes using the slot blot protocol described for the C410T mutation. 

Of all of the nucleotide substitutions co-segregated with the disease in their 
respective pedigrees (figures 3a to 3e), none were seen in asymptomatic family 
members aged more than two standard deviations beyond the mean age of onset, and 
10 none were present on 284 chromosomes from unrelated neurologically normal 
subjects drawn from comparable ethnic origins. 

Identification of an Alternative Splice Form of the ARMP Gene Product 

During sequencing studies of RT-PCR products for the ARMP gene recovered 

15 from a variety of tissues, it was discovered that some peripheral tissues (principally 
white blood cells) demonstrated two alternative splice forms of the ARMP gene. One 
form is identical to the (putatively 467 amino acid) isoform constitutatively expressed 
in all brain regions. The alternative splice form results from the exclusion of the 
segment of the cDNA between base pairs 1018 to 1116 inclusive, and results in a 

20 truncated isoform of the ARMP protein wherein the hydrophobic part of the 
hydrophilic acidically-charged loop immediately C-terminal to TM6 is removed. This 
alternatively spliced isoform therefore is characterized by preservation of the sequence 
N-terminal to and including the tyrosine at position 256, changing of the aspartate at 
257 to alanine, and splicing on to the C-terminal part of the protein from and 

25 including tyrosine 291. Such splicing differences are often associated with important 
functional domains of the proteins. This argues that this hydrophilic loop (and 
consequently the N-terminal hydrophillic loop with similar amino acid charge) is/are 
active functional domains of the ARMP product and thus sites for therapeutic 
targeting. 

30 
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ARMP Protein 

With respect to DNA SEQ ID NO. 1 and DNA SEQ ID NO: 132, analysis of 
the sequence of overlapping cDNA clones predicted an ORF protein of 467 amino 
acids when read from the first in phase ATG start codon and a molecular mass of 
5 approximately 52. 6 fcDa as later described, due to either polymorphisms in the protein 
or alternate splicing of the transcript, the molecular weight of the protein can Yary 
due to possible substitutions or deletions of amino acids. 

The analysis of predicted amino acid sequence using the Hopp and Woods 
algorithm suggested that the protein product is a multispanning integral membrane 

10 protein such as a receptor, a channel protein, or a structural membrane protein. The 
absence of recognizable signal peptide and the paucity of glycosylation sites are 
noteworthy, and the hydropathy profile suggests that the protein is less likely to be 
a soluble protein with a highly compact three-dimensional structure. 

The protein may be a cellular protein with a highly compact three dimensional 

15 structure in which respect is may be similar to APOE which is also related to 
Alzheimer's Disease. In light of this putative functional role, it is proposed that this 
protein be labelled as the Alzheimer Related Membrane Protein (ARMP). The 
protein also contains a number of potential phosphorylation sites, one of which is the 
consensus site for MAPkinase which is also involved in the hyperphosphorylation of 

20 tau during the conversion of normal tau to neurofibrillary tangles. This consensus 
sequence may provide a common putative pathway linking this protein and other 
known biochemical aspects of Alzheimer's Disease and would represent a likely 
therapeutic target. Review of the protein structure reveals two sequences YTPF 
(residues 115-119) and SITE (residues 353 - 356) which represent the 5/T-P motif 

25 which is the MAP kinase consensus sequence. Several other phosphorylation sites 
exist with concensus sequences for Protein Kinase C activity. Because protein kinase 
C activity is associated with differences in the metabolism of APP which are relevant 
to Alzheimer's Disease, these sites on the ARMP protein and homologues axe sites 
for therapeutic targetting. 

30 The N-terminal is characterized by a highly hydrophilic acidic charged domain 
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with several potential phosphorylation domains, followed sequentially by a 
hydrophobic membrane spanning domain of 19 residues; a charged hydrophilic loop, 
then five additional hydrophobic membrane spanning domains interspersed with short 
(5-20 residue) hydrophilic domains; an additional larger acidic hydrophilic charged 
5 loop, and then at least one and possibly two other hydrophobic potentially membrane 
spanning domains culminating in a polar domain at the C-terminus (Table 4 and 
Figure 6B). The presence of seven membrane spanning domains is characteristic of 
several classes of G-coupled receptor proteins but is also observed with other proteins 
including channel proteins. 

10 Comparison of the nucleic acid and predicted amino acid sequences with 

available databases using the BLAST alignment paradigms revealed amino acid 
similarity with the C elegant sperm integral membrane protein spe-4 and a similarity 
to several other membrane spanning proteins including mammalian chromogranin A 
and the a-subunit of mammalian voltage dependent calcium channels, 

15 The similarity between the putative products of the spe-4 and ARMP genes 

implies that they may have similar activities. The SPE-4 protein of C. elegans 
appears to be involved in the formation and stabilization of the fibrous body- 
membrane organelle (FBMO) complex during spermatogenesis. The FBMO is a 
specialized Golgi-derived organelle, consisting of a membrane bound vesicle attached 

20 to and partly surrounding a complex of parallel protein fibers and may be involved 
in the transport and storage of soluble and membrane-bound polypeptides. Mutations 
in spe-4 disrupt the FBMO complexes and arrest spermatogenesis. Therefore the 
physiologic function of spe-4 may be either to stabilize interactions between integral 
membrane budding and fusion events, or to stabilize interactions between the 

25 membrane and fibrillary proteins during the intracellular transport of the FBMO 
complex during spermatogenesis. Comparable functions could be envisaged for the 
ARMP. The ARMP could be involved either in the docking of other membrane- 
bound proteins such as 0APP, or the axonal transport and fusion budding of 
membrane-bound vesicles during protein transport such as in the golgi apparatus or 

30 endosome-lysosome system. If correct, then mutations might be expected to result 
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in aberrant transport and processing of jSAPP and/or abnormal interactions with 
cytoskeletal proteins such as the microtubule-associated protein Tau. Abnormalities 
in the intracellular and in the extracellular disposition of both 0APP and Tau are in 
fact an integral part of the neuropathologic features of Alzheimer's Disease. 
Although the location of the ARMP mutations in highly conserved residues within 
conserved domains of the putative proteins suggests that they are pathogenic, at least 
three of these mutations are conservative which is commensurate with the onset of 
disease in adult life. Because none of the mutations observed so far are deletions or 
nonsense mutations that would be expected to cause a loss of function, we cannot 
predict whether these mutations will have a dominant gain-of-function effect and 
promote aberrant processing of 0APP or a dominant loss-of-function effect causing 
arrest of normal 0APP processing. 

An alternative possibility is that the ARMP gene product may represent a 
receptor or channel protein. Mutations of such proteins have been causally related 
to several other dominant neurological disorders in both vertebrate (eg. Malignant 
hyperthermia, hyperkaleraic periodic paralysis in humans) and in invertebrate 
organisms (deg-l(d) mutants in Cekgans). Although the pathology of these other 
disorders does not resemble that of Alzheimer's Disease there is evidence for 
functional abnormalities in ion channels in Alzheimer's Disease. For example, 
anomalies have been reported in the tetra-ethylammonium-sensitive 113pS potassium 
channel and in calcium homeostasis. Perturbations in transmembrane calcium fluxes 
might be especially relevant in view of the weak homology between SI 82 and the a- 
ID subunit of voltage-dependent calcium channels and the observations that increases 
in intracellular calcium in cultured cells can replicate some of the biochemical 
features of Alzheimer's Disease such as alteration in the phosphorylation of Tau- 
microtubule-associated protein and increased production of A/3 peptides. 

As mentioned purified normal ARMP protein is characterized by a molecular 
weight of 52.6kDa, The normal ARMP protein, substantially free of other proteins, 
is encoded by the aforementioned SEQ. ID No. 1 and SEQ ID NO: 132. As will be 
later discussed, the ARMP protein and fragments thereof may be made by a variety 
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of methods. Purified mutant ARMP protein is characterized by FAD - associated 
phenotype (necrotic death, apoptic death, granulovascular degeneration, 
neurofibrillary degeneration, abnormalities or changes in the metabolism of APP, and 
Ca 2 +, K+, and glucose, and mitochondrial function and energy metabolism 
5 neurotrasmitter metabolism, all of which have been found to be abnormal in human 
brain, and/or peripheral tissue cells in subjects with Alzheimer's Disease) in a variety 
of cells. The mutant ARMP, free of other proteins, is encoded by the mutant DNA 
sequence. 

10 Description o f the E5-7 pene. a homoloyue of the ARMP eene 

A gene, E5-1, with substantial nucleotide and amino acid homology to the 
ARMP gene was identified by using the nucleotide sequence of the cDNA for ARMP 
to search data bases using the BLASTN paradigm of Altschul et al. 1990. Three 
expressed sequence tagged sites (ESTs) identified by accession numbers T03796, 

15 R14600, and R05907 were located which had substantial homology (p < 1.0 e" lC0 , 
greater than 97% identity over at least 100 contiguous base pain). 

Oligonucleotide primers were produced from these sequences and used to 
generate PCR products by reverse transcriptase PCR (RT-PCR). These short 
RT-PCR products were partially sequenced to confirm their identity with the 

20 sequences within the data base and were then used as hybridization probes to screen 
full-length cDNA libraries. Several different cDNA's ranging in size from 1 Kb to 
2.3 Kb were recovered from a cancer cell cDNA library (CaCo-2) and from a human 
brain cDNA library (E5-1, Gl-1, cc54, cc32). 

The nucleotide sequence of these clones confirmed that all were derivatives 

25 of the same transcript (designated E5-1). 

The gene encoding the E5-1 transcript mapped to human chromosome 1 using 
hybrid mapping panels and to two clusters of CEPH Mega YAC clones which have 
been placed upon a physical contig map (YAC clones 750g7, 921dl2 mapped by 
FISH to lq41; and YAC clone 787gl2 which also contains an EST for the leukemia 

30 associated phosphoprotein (JLAP18) gene which has been mapped to lp36. l-p35) (data 



27 



not shown). 

Hybridization of the E5-1 cDNA clones to Northern Biota detected an -2.3 
kilobase mRNA band in many tissues including regions of the brain, as well as a 
-2.6Kb mRNA band in muscle, cardiac muscle and pancreas (Figure 7), 
5 In skeletal muscle, cardiac muscle and pancreas, the E5-1 gene is expressed 

at relatively higher levels than in brain and as two different transcripts of -2,3 Kb 
and -2.6 Kb. Both of the E5-1 transcripts have sizes clearly distinguishable from 
that of the 2.7 Kb ARMP transcript, and did not cross-hybridize with ARMP probes 
at high stringency. The cDNA sequence of the E5-1 gene is identified as Sequence 

10 ID No. 136. 

The longest ORF within the E5-1 cDNA consensus nucleotide sequence 
predicts a polypeptide containing 448 amino acids (numbering from the first in-phase 
ATG codon which was surrounded by a flCG-agg-GCt-ATG-c Kozak consensus 
sequence) (Sequence ID No. 137). 

15 A comparison of the amino acid sequences of hARMP and £5-1 homologue 

protein are shown in Table 8. Identical residues are indicated by vertical lines. The 
locations of mutations in the E5-1 gene are indicated by downward pointing arrows. 
The locations of the mutations in the hARMP gene are indicated by upward pointing 
arrows. Putative TM domains are in open ended boxes. The alternatively spliced 

20 exons are denoted by superscripted (E5-1) or subscripted (hARMP) 

BLAST? alignment analyses also detected significant homology with SPE-4 
of C elegans (P = 3.5e-26; identity = 20-63% over five domains of at least 22 
residues)/ and weak homologies to brain sodium channels (alpha m subunit) and to 
the alpha subunit of voltage dependent calcium channels from a variety of species (P 

25 - 0.02; identities 20-28% over two or more domains each of at lost 35 residues) 
(Altschul, 1990). These alignments are similar to those described above for the 
ARMP gene. However, the most striking Jiqmplogy tathe E5-1 protein was found 
with the amino acid sequence predicted for ARMP, ARMP and E5-1 proteins share 
63% overall amino acid sequence identity, and several domains display virtually 

30 complete identity (Table 8), Furthermore, all eight residues mutated in ARMP in 
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subjects with AD3 arc conserved in the E5-1 protein (Table 8). As would be 
expected, hydrophobicity analyses suggest that both proteins also share a similar 
structural organization. 

The similarity was greatest in several domains of the protein corresponding 
5 to the intervals between transmembrane domain 1 (TM1) and TM6, and from TM7 
to the C-tenninus of the ARMP gene. The main difference from ARMP is a 
difference in the size and amino acid sequence of the acidically-charged hydrophilic 
loop in the .position 'equivalent to the hydrophilic loop between transmembrane 
domains TM6 and TM7 in the ARMP protein and in the sequence of the N-termihal 

10 hydrophilic domains* 

Thus, both proteins are predicted to possess seven hydrophobic putative 
transmembrane domains, and both proteins bear large acidic hydrophilic domains at 
the N-terminus and between TM6 and TM7 (Figs. 6 and 8). A further similarity 
arose from analysis of RT-PCR products from brain and muscle RNA, which revealed 

15 that nucleotides 1153-1250 of the E5-1 transcript are alternatively spliced. These 
nucleotides encode amino acids 263-296, which are located within the TM6-TM7 loop 
domain of the putative E5-1 protein, and which share 9,4% sequence identity with the 
alternatively spliced residues 257-290 in ARMP. 

The most noticeable differences between the two predicted amino acid 

20 sequences occur in the amino acid sequence in the central portion of the TM6-*TM7 
hydrophilic loop (residues 304-374 of ARMP; 310 - 355 of E5-1), and in the N- 
terminal hydrophilic domain (Table 8). By analogy, this domain is also less highly 
conserved between the murine and human ARMP genes (identity = 47/60 residues), 
and shows no similarity with the equivalent region of SPE-4. 

25 A splice variant of the E5-1 cDNA sequence identified as Sequence ID No, 

136 has also been found in all tissues examined. This splice variant lacks the triplet 
GAA at nucleotide positions 1338-1340, 

A further variant has been found in one normal individual whose E5-1 cDNA 
had C replacing T at nucleotide position 626, which does not change the amino acid 

30 sequence. 
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Mutations of the £5-7 gene associated with Alzheimer's Disease 

The strong similarity between ARMP and the E5-1 gene product raised the 
possibility that the E5~l gene might be the site of disease-causing mutations in some 
of a small number of early onset AD pedigrees in which genetic linkage studies have 
5 excluded chromosomes 14, 19 and 21. RT-PCR was used to isolate cDNAs 
corresponding to the E5-1 transcript from lymphoblasts, fibroblasts or post-mortem 
brain tissue of affected members of eight pedigrees with early onset familial AD 
(FAD) in which mutations in the fiAPP and ARMP gene had previously been 
excluded by direct sequencing studies. 

10 Examination of these RT-PCR products detected a heterozygous A-*G 

substitution at nucleotide 1080 in all four affected members of an extended pedigree 
of Italian origin (FlolO) with early onset, pathologically confirmed FAD (onset = 50 
-70 yrs). This mutation would be predicted to cause a Met-^Val missense mutation 
at codon 239 (Table 8). 

15 A second mutation (A-*T at nucleotide 787) causing a Asn-*He substitution at 

codon 141 was found in affected members of a group of related pedigrees of Volga 
German ancestry (represented by cell lines AG09369, AG09907, AG09952, and 
AG09905, Coriell Institute, Camden NJ). Significantly, one subject (AG09907) was 
homozygous for this mutation, an observation compatible with the in-bred nature of 

20 these pedigrees. Significantly, this subject did not have a significantly different 
clinical picture from those subjects heterozygous for the ArgHlHe mutation. Neither 
of the E5-1 gene mutations were found in 284 normal Caucasian controls nor were 
they present in affected members of pedigrees with the AD3 type of AD, 

Both of these mutations would be predicted to cause substitution of residues 

25 which are highly conserved within the ARMP/£5-i gene family. 

The finding of a gene whose product is predicted to share substantial amino 
acid and structural similarities with the ARMP gene product suggests that these 
proteins may be functionally related either as independent proteins with overlapping 
functions but perhaps with slightly different specific activities, as physically associated 

30 subunits of a multimeric polypeptide or as independent proteins performing 
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consecutive functions in the same pathway. 

The observation of two different missense mutations in conserved domains of 
the E5-1 protein in subjects with a familial form of AD argues that these mutations 
are, like those in the ARMP gene, causal to AD. This conclusion is significant 
5 because, while the disease phenotypes associated with mutations in the ARMP gene 
(onset 30-50yrs, duration 10 years) arc subtly different from that associated with 
mutations in the E5-1 gene (onset 40-70 years; duration up to 20yrs) f the general 
similarities clearly argue that the biochemical pathway subsumed by members of this 
gene family is central to the genesis of at least early onset AD. The subtle 

10 differences in disease phenotype may reflect a lower level of expression of the E5*l 
transcript in the CNS, or may reflect a different role for the E5-1 gene product. 

By analogy to the effects of ARMP mutations, E5-1 when mutated may cause 
aberrant processing of APP (Amyloid Precursor Protein) into A0 peptide, 
hyperphosphorylation of Tau microtubule associated protein and abnormalities of 

15 intracellular calcium homeostasis. Interference with these anomalous interactions 
provides a potential therapy for AD. 

Functional Doma ins of the ARMP Protein are Defined by Splicing Sites and 
Similarities within Other Members of a Gene Family 

20 The ARMP protein is a member of a novel class of transmembrane proteins 

which share substantial amino acid homology. The homology is sufficient that certain 
nucleotide probes and antibodies raised against one can identify other members of this 
gene family. The major difference between members of this family reside in the 
amino acid and nucleotide sequence homologous to the hydrophillic acid loop domain 

25 between putative transmembrane 6 and transmembrane 7 domains of the ARMP gene 
and gene product. This region is alternatively spliced in some non-neural tissues, and 
is also the site of several pathogenic disease-causing mutations in the ARMP gene. 
The variable splicing of this hydrophillic loop, the presence of a high-density of 
pathogenic mutations within this loop, and the fact that the amino acid sequences of 

30 the loop differs between members of the gene family suggest that this loop is an 
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important functional domain of the protein and may confer some specificity to the 
physiologic and pathogenic interactions which the ARMP gene product undergoes 
because the N-terminal hydrophillic domain shares the same acidic charge and same 
orientation with respect to the membrane, it is very likely that these two domains 

5 share functionality either in a coordinated (together) or independent fashion (eg. 
different ligands or functional properties). As a result everything said about the 
hydrophillic loop shall apply also to the N-terminal hydrophillic domain. 

Knowledge of the specificity of the loop can be used to identify ligands and 
functional properties of the ARMP gene product (eg. sites of interactions with APP, 

10 cytosolic proteins such as kinases, Tau, and MAP, etc.). Soluble recombinant fusion 
proteins can be made or the nucleotide sequence coding for amino acids within the 
loop or parts of the loop can be expressed in suitable vectors (yeast-2-hybrid, 
baculovirus, and phage - display systems for instance), and used to identify other 
proteins which interact with ARMP in the pathogenesis of Alzheimer's disease and 

15 other neurological and psychiatric diseases. Therapies can be designed to modulate 
these interactions and thus to modulate Alzheimer's disease and the other conditions 
associated with acquired or inherited abnormalities of the ARMP gene or its gene 
products. The potential efficacy of these therapies can be tested by analyzing the 
affinity and function of these interactions after exposure to the therapeutic agent by 

20 standard pharmacokinetic measurements of affinity (Kd and Vmax etc) using synthetic 
peptides or recombinant proteins corresponding to functional domains of the ARMP 
gene (or its homologues). An alternate method for assaying the effect of any 
interactions involving functional domains such as the hydrophillic loop is to monitor 
changes in the intracellular trafficking and post-translational modification of the 

25 ARMP gene by in-situ hybridization, immunohistochemistry, Western blotting and 
metabolic pulse-chase labelling studies in the presence of and in the absence of the 
therapeutic agents. A third way is to monitor the effects of "downstream" events 
including (i) changes in the intracellular metabolism, trafficking and targeting of APP 
and its products; (ii) changes in second messenger event eg, cAMP, intracellular 

30 Ca ++ protein kinase activities, etc.. 
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Isolation and Purification of the ARMP Protein 

The ARMP protein may be isolated and purified by methods selected on the 
basis of properties revealed by its sequence. Since the protein possesses properties 
5 of a membrane-spanning protein, a membrane fraction of cells in which the protein 
is highly expressed (eg. central nervous system cells or cells from other tissues) 
would be isolated and the proteins removed by extraction and the proteins solubilized 
using a detergent. 

Purification can be achieved using protein purification procedures such as 
10 chromatography methods (gel-filtration, ion-exchange and immunoaffinity), by high- 
performance liquid chromatography (RP-HPLC, ion-exchange HPLC t size-exclusion 
HPLC, high-performance chromatofocusing and hydrophobic interaction 
chromatography) or by precipitation (immunoprecipitation). Polyacrylamide gel 
electrophoresis can also be used to isolate the ARMP protein based on its molecular 
15 weight, charge properties and hydrophobicity. 

Similar procedures to those just mentioned could be used to purify the protein 
from cells transfected with vectors containing the ARMP gene (eg. baculovirus 
systems, yeast expression systems, eukaryotic expression systems), 

Purified protein can be used in further biochemical analyses to establish 
20 secondary and tertiary structure which may aid in the design of pharmaceuticals to 
interact with the protein, alter protein charge configuration or charge interaction with 
other proteins, lipid or saccharide moities, alter its function in membranes as a 
transporter channel or receptor and/or in cells as an enzyme or structural protein and 
treat the disease. 

25 The protein can also be purified by creating a fusion protein by legating the 

ARMP cDNA sequence to a vector which contains sequence for another peptide (eg. 
GST - glutathionine succinyl transferase). The fusion protein is expressed and 
recovered from prokaryotic (eg, bacterial or baculovirus) or eukaryotic cells. The 
fusion protein can then be purified by affinity chromatography based upon the fusion 

30 vector sequence. Hie ARMP protein can then be further purified from the fusion 
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protein by enzymatic cleavage of the fusion protein. 

Isolating mouse ARMP yens 

In order to characterize the physiological significance of the normal and 
5 mutant hARMP gene and gene products in a transgenic mouse model it was necessary 
to recover a mouse homologue of the hARMP gene. We recovered a murine 
homologue for the hARMP gene by screening a mouse cDNA library with a labelled 
human DNA probe and in this manner recovered a 2 kb partial transcript 
(representing the 3' end of the gene) and several RT-PCR products representing the 

10 5'end. Sequencing of the concensus cDNA transcript of the murine homologue 
revealed substantial amino acid identity. The sequence cDNA is identified in 
Sequence ID No. 3 and the predicted amino acid Sequence is provided in Sequence 
ID No. 4. Further sequencing of the mouse cDNA transcript has provided the 
sequence for the complete coding sequence identified as SEQ ID NO: 134 and the 

15 predicted amino acid sequence from this sequence is provided in SEQ ID NO: 135. 
More importantly, all of the amino acids that were mutated in the FAD pedigrees 
were conserved between the murine homologue and the normal human variant (Table 
3), This conservation of the ARMP gene as is shown in table 3, indicates that an 
orthologous gene exists in the mouse (mARMP), and it is now possible to clone 

20 mouse genomic libraries using human ARMP probes. This will also make it possible 
to identify and characterize the ARMP gene in other species. This also provides 
evidence of animals with various disease states or disorders currently known or yet 
to be elucidated. 

25 

Transgenic Mouse Model 

The creation of a mouse model for Alzheimer's Disease is important to the 
understanding of the disease and for the testing of possible therapies. Currently no 
unambiguous viable animal model for Alzheimer's Disease exists. 
30 There are several ways in which to create an animal model for Alzheimer's 
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Disease* Generation of a specific mutation in the mouse gene such as the identified 
hARMP gene mutations is one strategy. Secondly, we could insert a wild type human 
gene and/or humanize the murine gene by homologous recombination. Thirdly, it is 
also possible to insert a mutant (single or multiple) human gene as genomic or 
5 minigene cDNA constructs using wild type or mutant or artificial promoter elements. 
Fourthly, knock-out of the endogenous murine genes may be accomplished by the 
insertion of artificially modified fragments of the endogenous gene by homologous 
recombination. The modifications include insertion of mutant stop codons, the 
deletion of DNA sequences, or the inclusion of recombination elements (lex p sites) 

10 recognized by enzymes such as Cre recombinase. 

To inactivate the mARMP gene chemical or x-ray mutagenesis of mouse 
gametes, followed by fertilization, can be applied. Heterozygous offspring can then 
be identified by Southern blotting to demonstrate loss of one allele by dosage, or 
failure to inherit one parental allele using RFLP markers. 

15 To create a transgenic mouse a mutant version of hARMP or mARMP can 

be inserted into a mouse germ line using standard techniques of oocyte microinjection 
or transfection or microinjection into stem cells. Alternatively, if it is desired to 
inactivate or replace the endogenous mARMP gene, homologous recombination using 
embryonic stem cells may be applied. 

20 For oocyte injection, one or more copies of the mutant or wild type ARMP 

gene can be inserted into the pronucleus of a just-fertilized mouse oocyte. This 
oocyte is then reimplanted into a pseudo-pregnant foster mother. The liveborn mice 
can then be screened for integrants using analysis of tail DNA for the presence of 
human ARMP gene sequences. The transgene can be either a complete genomic 

25 sequence injected as a YAC, BAC, PAC or other chromosome DNA fragment, a 
cDNA with either the natural promoter or a heterologous promoter, or a minigene 
containing all of the coding region and other elements found to be necessary for 
optimum expression. 

Retroviral infection of early embryos can also be done to insert the mutant or 

30 wild type hARMP. In this method, the mutant or wild type hARMP is inserted into 
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a retroviral vector which is used to directly infect mouse embryos during the early 
stages of development to generate a chimera, some of which will lead to germline 
transmission. Similar experiments can be conducted in the cause of mutant proteins, 
using mutant murine or other animal ARMP gene sequences. 
5 Homologous recombination using stem cells allows for the screening of gene 

transfer cells to identify the rare homologous recombination events. Once identified, 
these can be used to. generate chimeras by injection of mouse blastocysts, and a 
proportion of the resulting mice will show germline transmission from the 
recombinant line. This methodology is especially useful if inactivation of the 

10 mARMP gene is desired. For example, inactivation of the mARMP gene can be done 
by designing a DNA fragment which contains sequences from a mARMP exon 
flanking a selectable marker. Homologous recombination leads to the insertion of the 
marker sequences in the middle of an exon, inactivating the mARMP gene. DNA 
analysis of individual clones can then be used to recognize the homologous 

15 recombination events. 

It is also possible to create mutations in the mouse germline by injecting 
oligonucleotides containing the mutation of interest and screening the resulting cells 
by PCR. 

This embodiment of the invention has the most significant commercial value 
20 as a mouse model for Alzheimer's Disease. Because of the high percentage of 
sequence conservation between human and mouse it is contemplated that an 
orthologous gene will exist also in many other species. It is thus contemplated that 
it will be possible to generate other animal models using similar technology. 

25 Screening and Diagnosis for Alzheimer's Disease 

General Diagnostic Uses of the ARMP Gene and Gene Product 

The ARMP gene and gene products will be useful for diagnosis of Alzheimer's 
disease, presenile and senile dementias, psychiatric diseases such as schizophrenia, 
depression, etc., and neurologic diseases such as stroke and cerebral hemorrhage - 

30 all of which are seen to a greater or lesser extent in symptomatic subjects bearing 
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mutations in the ARMP gene or in the APP gene. Diagnosis of inherited cases of 
these diseases can be accomplished by analysis of the nucleotide sequence (including 
genomic and cDNA sequences included in this patent). Diagnosis can also be achieved 
by monitoring alterations in the electrophoretic mobility and by the reaction with 
5 specific antibodies to mutant or wild-type ARMP gene products, and by functional 
assays demonstrating altered function of the ARMP gene product. In addition, the 
ARMP gene and ARMP gene products can be used to search for inherited anomalies 
in the gene and/or its products (as well as those of the homologous gene) and can also 
be used for diagnosis in the same way as they can be used for diagnosis of non- 
10 genetic cases. 

Diagnosis of non-inherited cases can be made by observation of alterations in 
the ARMP transcription, translation, and post-translational modification and 
processing as well as alterations in the intracellular and extracellular trafficking of 
ARMP gene products in the brain and peripheral cells. Such changes will include 
IS alterations in the amount of ARMP messenger RNA and/or protein, alteration in 
phosphorylation state, abnormal intracellular location/distribution, abnormal 
extracellular distribution, etc. Such assays will include: Northern Blots (with 
ARMP-specific and ARMP-non-specific nucleotide probes which also cross-react with 
other members of the gene family), and Western blots and enzyme-linked 
20 immunosorbent assays (ELISA) (with antibodies raised specifically to: ARMP; to 
various functional domains of ARMP; to other members of the homologous gene 
family; and to various post-translational modification states including glycosylated and 
phosphorylated isoforms), These assays can be performed on peripheral tissues (eg. 
blood cells, plasma, cultured or other fibroblast tissues, etc.) as well as on biopsies 
25 of CNS tissues obtained antimortem or postmortem, and upon cerebrospinal fluid. 
Such assays might also include in-situ hybridization and imraunohistochemistry (to 
localized messenger RNA and protein to specific subcellular compartments and/or 
within neuropathological structures associated with these diseases such as 
neurofibrillary tangles and amyloid plaques). 

30 
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Screening for Alzheimer's Disease 

Screening for Alzheimer's Disease as linked to chromosome 14 may now be 
readily carried out because of the knowledge of the mutations in the gene. 

People with a high risk for Alzheimer's Disease (present in family pedigree) 

5 or, individuals not previously known to be at risk, or people in general may be 
screened routinely using probes to detect the presence of a mutant ARMP gene by a 
variety of techniques. Genomic DNA used for the diagnosis may be obtained from 
body cells, such as those present in the blood, tissue biopsy, surgical specimen, or 
autopsy material. The DNA may be isolated and used directly for detection of a 

10 specific sequence or may be PCR amplified prior to analysis. RNA or cDNA may 
also be used. To detect a specific DNA sequence hybridization using specific 
oligonucleotides, direct DNA sequencing, restriction enzyme digest, RNase 
protection, chemical cleavage, and ligase-mediated detection are all methods which 
can be utilized. Oligonucleotides specific to mutant sequences can be chemically 

15 synthesized and labelled radioactively with isotopes, or non-radioactively using biotin 
tags, and hybridized to individual DNA samples immobilized on membranes or other 
solid-supports by dot-blot or transfer from gels after electrophoresis. The presence 
or absence of these mutant sequences are then visualized using methods such as 
autoradiography, fluorometry, or colorimetric reaction. Examples of suitable PCR 

20 primers which are useful for example in amplifying portions of the subject sequence 
containing the aforementioned mutations are set out in Table 5. This table also sets 
out the change in enzyme site to provide a useful diagnostic tool as defined herein. 

Direct DNA sequencing reveals sequence differences between normal and 
mutant ARMP DNA. Cloned DNA segments may be used as probes to detect 

25 specific DNA segments. PCR can be used to enhance the sensitivity of this method. 
PCR is an enzymatic amplification directed by sequence-specific primers, and 
involves repeated cycles of heat denaturation of the DNA, annealing of the 
complementary primers and extension of the annealed primer with a DNA 
polymerase. This results in an exponential increase of the target DNA. 

30 Other nucleotide sequence amplification techniques may be used, such as 
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ligation-mediated PCR, anchored PCR and enzymatic amplification as would be 
understood by those skilled in the art. 

Sequence alterations may also generate fortuitous restriction enzyme 
recognition sites which are revealed by the use of appropriate enzyme digestion 
5 followed by gel-blot hybridization, DNA fragments carrying the site (normal or 
mutant) are detected by their increase or reduction in size, or by the increase or 
decrease of corresponding restriction fragment numbers. Genomic DNA samples may 
also be amplified by PCR prior to treatment with the appropriate restriction enzyme 
and the fragments of different sizes are visualized under UV light in the presence of 

10 ethidium bromide after gel electrophoresis. 

Genetic testing based on DNA sequence differences may be achieved by 
detection of alteration in electrophoretic mobility of DNA fragments in gels. Small 
sequence deletions and insertions can be visualized by high resolution gel 
electrophoresis. Small deletions may also be detected as changes in the migration 

15 pattern of DNA heteroduplexes in non-denaturing gel electrophoresis. Alternatively, 
a single base substitution mutation may be detected based on differential PCR product 
length in PCR. The PCR products of the normal and mutant gene could be 
differentially detected in acrylamide gels. 

Nuclease protection assays (SI or ligase-mediated) also reveal sequence 

20 changes at specific locations. 

Alternatively, to confirm or detect a polymorphism restriction mapping 
changes ligated PCR, ASO, REF-SSCP chemical cleavage, endonuclease cleavage at 
mismatch sites and SSCP may be used. Both REF-SSCP and SSCP are mobility shift 
assays which are based upon the change in conformation due to mutations. 

25 DNA fragments may also be visualized by methods in which the individual 

DNA samples are not immobilized on membranes. The probe and target sequences 
may be in solution or the probe sequence may be immobilized. Autoradiography, 
radioactive decay, spectrophotometry, and fluorometry may also be used to identify 
specific individual genotypes. Finally, mutations can be detected by direct nucleotide 

30 sequencing. 
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According to an embodiment of the invention, the portion of the cDNA or 
genomic DNA segment that is informative for a mutation, can be amplified using 
PGR, For example, the DNA segment immediately surrounding the C 410 Y 
mutation acquired from peripheral blood samples from an individual can be screened 
5 using the oligonucleotide primers 885 (tggagactggaacacaac) sequence ID No: 128 and 
893 (gtgtggccagggtagagaact) sequence ID No: 129. This region would then be 
amplied by PCR, the products separated by electrophoresis, and transferred to 
membrane. Labelled oligonucleotide probes are then hybridized to the DNA 
fragments and autoradiography performed. 

10 

ARMP Expression 

As an embodiment of the present invention, AMRP protein may be expressed 
using eukaryotic and prokaryotic expression systems. Eukaryotic expression systems 
can be used for many studies of the ARMP gene and gene product including 

15 determination of proper expression and post-translational modifications for full 
biological activity, identifying regulatory elements located in the 5' region of the 
ARMP gene and their role in tissue regulation of protein expression, production of 
large amounts of the normal and mutant protein for isolation and purification, to use 
cells expressing the ARMP protein as a functional assay system for antibodies 

20 generated against the protein or to test effectiveness of pharmacological agents, or as 
a component of a signal transduction system, to study the function of the normal 
complete protein, specific portions of the protein, or of naturally occurring and 
artificially produced mutant proteins. 

Eukaryotic and prokaryotic expression systems were generated using two 

25 different classes of ARMP nucleotide cDNA sequence inserts. In the first class, 
termed full-length constructs, the entire ARMP cDNA sequence is inserted into the 
expression plasmid in the correct orientation, and includes both the natural 5* UTR 
and 3* UTR sequences as well as the entire open reading frame. The open reading 
frames bear a nucleotide sequence cassette which allows either the wild type open 

30 reading frame to be included in the expression system or alternatively, single or a 
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combination of double mutations can be inserted into the open reading frame. This 
was accomplished by removing a restriction fragment from the wild type open reading 
frame using the enzymes Narl and Pflml and replacing it with a similar fragment 
generated by reverse transcriptase PCR and which bears the nucleotide sequence 
encoding either the Metl46Leu mutation or the Hysl63Arg mutation. A second 
restriction fragment was removed from the wild type normal nucleotide sequence for 
the open reading frame by cleavage with the enzymes Pflml and Ncol and replaced 
with restriction fragments bearing either the nucleotide sequence encoding the 
Ala246Glu mutation, or the Ala260Val mutation or the Ala285Val mutation or the 
Leu286Val mutation, or the Leu392Val mutation, or the Cys410Tyr mutation. 
Finally, a third variant bearing combinations of either the Metl46Leu or Hisl63Arg 
mutations in tandem with the remaining mutations, was made by linking the Narl- 
Pflml fragment bearing these mutations and the Pflml-Ncol fragments bearing the 
remaining mutations. 

A second variant of cDNA inserts bearing wild type or mutant cDNA 
sequences was constructed by removing from the full-length cDNA the 5' UTR and 
part of the 3' UTR sequences. The 5* UTR sequence was replaced with a synthetic 
oligonucleotide containing a Kpnl restriction site and a Kozak initiation site 
(oligonucleotide 969: ggtaccgccaccatgacagaggtacctgcac, Sequence ID No: 138). The 
3' UTR was replaced with an oligonucleotide corresponding to position 2566 of the 
cDNA and bears an artificial BcoRI site (oligonucleotide 
970:gaattcactggctgtagaaaaagac, Sequence ID No: 139). Mutant variants of this 
construct were then made by inserting the same mutant sequences described above at 
the Narl-PfimI fragment, and at the Pslml-Ncol sites described above. 

For eukaryotic expressions, these various cDNA constructs bearing wild type 
and mutant sequences described above were cloned into the expression vector pZeoS V 
(invitrogen). For prokaryotic expression, two constructs have been made using the 
glutathione S-transferase fusion vector pGEX-kg. The inserts which have been 
attached to the GST fusion nucleotide sequence are the same nucleotide sequence 
described above (generated with the oligonucleotide primers 969, Sequence ID 
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No:138 and 970, Sequence ID No: 139) bearing either the normal open reading frame 
nucleotide sequence, or bearing a combination of single and double mutations as 
described above. This construct allows expression of the full-length protein in mutant 
and wild type variants in prokaryotic cell systems as a GST fusion protein which 

5 allows purification of the full-length protein followed by removal of the GST fusion 
product by thrombin digestion. The second prokaryotic cDNA construct was 
generated to create a fusion protein with the same vector, and allows the production 
of the amino acid sequence corresponding to the hydrophillic acidic loop domain 
between TM6 and TM7 of the full-length protein, as either a wild type nucleotide 

10 sequence (thus a wild type amino acid sequence for fusion proteins) or as a mutant 
sequence bearing either the Ala285Val mutation, or the Leu286Val mutation, or the 
Leu392Val mutation. This was accomplished by recovering wild type or mutant 
sequence from appropriate sources of RNA using the oligonucleotide primers 
989:ggatccggtccacttcgtatgctg, Sequence ID No: 140, and 

15 990:ttttttgaattcttaggctatggttgtgttcca, Sequence ID No: 141. This allows cloning of the 
appropriate mutant or wild type nucleotide sequence corresponding to the hydrophillic 
acid loop domain at the BamHI and the EcoRI sites within the pGEX-KG vector. 

These prokaryotic expression systems allow the holo-protein or various 
important functional domains of the protein to be recovered as fusion proteins and 

20 then used for binding studies, structural studies, functional studies, and for the 
generation of appropriate antibodies. 

Expression of the ARMP gene in heterologous cell systems can be used to 
demonstrate structure-function relationships. Iigating the ARMP DNA sequence into 
a plasmid expression vector to transfect cells is a useful method to test the proteins 

25 influence on various cellular biochemical parameters. Plasmid expression vectors 
containing either the entire, normal or mutant human or mouse ARMP sequence or 
portions thereof, can be used in in vitro mutagenesis experiments which will identify 
portions of the protein crucial for regulatory function. 

The DNA sequence can be manipulated in studies to understand the expression 

30 of the gene and its product, to achieve production of large quantities of the protein 
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for functional analysis, for antibody production, and for patient therapy. The changes 
in the sequence may or may not alter the expression pattern in terms of relative 
quantities, tissue-specificity and functional properties. Partial or full-length DNA 
sequences which encode for the ARMP protein, modified or unmodified, may be 
5 ligated to bacterial expression vectors. E. coli can be used using a variety of 
expression vector systems, eg. the T7 RNA polymerase/promoter system using two 
plasmids or by labeling of plasmid-encoded proteins, or by expression by infection 
with M13 Phage mGPI-2, E< coli vectors can also be used with Phage lamba 
regulatory sequences, by fusion protein vectors (eg. lacZ and trpE), by maltose- 
10 binding protein fusions, and by glutathione-S-transferase fusion proteins, etc., all of 
which together with many other prokaryotic expression systems are widely available 
commercially. 

Alternatively, the ARMP protein can be expressed in insect cells using 
baculoviral vectors, or in mammalian cells using vaccinia virus or specialised 

15 eukaryotic expression vectors. For expression in mammalian cells, the cDNA 
sequence may be ligated to heterologous promoters, such as the simian virus (SV40) 
promoter in the pSV2 vector and other similar vectors and introduced into cultured 
eukaryotic cells, such as COS cells to achieve transient or long-term expression, The 
stable integration of the chimeric gene construct may be maintained in mammalian 

20 cells by biochemical selection, such as neomycin and mycophoenolic acid. 

The ARMP DNA sequence can be altered using procedures such as restriction 
enzyme digestion, fill-in with DNA polymerase, deletion by exonuclease, extension 
by terminal deoxynucleotide transferase, ligation of synthetic or cloned DNA 
sequences and site-directed sequence alteration with the use of specific 

25 oligonucleotides together with PCR. 

The cDNA sequence or portions thereof, or a mini gene consisting of a cDNA 
with an intron and its own promoter, is introduced into eukaryotic expression vectors 
by conventional techniques* These vectors permit the transcription of the cDNA in 
eukaryotic cells by providing regulatory sequences that initiate and enhance the 

30 transcription of the cDNA and ensure its proper splicing and polyadenylation. The 
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endogenous ARMP gene promoter can also be used. Different promoters within 
vectors have different activities which alters the level of expression of the cDNA. In 
addition, certain promoters can also modulate function such as the glucocorticoid- 
responsive promoter from the mouse mammary tumor virus, 
5 Some of the vectors listed contain selectable markers or neo bacterial genes 

that permit isolation of cells by chemical selection. Stable long-term vectors can be 
maintained in cells as episomal, freely replicating entities by using regulatory 
elements of viruses. Cell lines can also be produced which have integrated the vector 
into the genomic DNA. In this manner, the gene product is produced on a continuous 
10 basis. 

Vectors are introduced into recipient cells by various methods including 
calcium phosphate, strontium phosphate, electroporatLon, lipofection, DEAE dextran, 
microinjection, or by protoplast fusion. Alternatively, the cDNA can be introduced 
by infection using viral vectors. 

15 Using the techniques mentioned, the expression vectors containing the ARMP 

gene or portions thereof can be introduced into a variety of mammalian cells from 
other species or into non-mammalian cells. 

The recombinant cloning vector, according to this invention, comprises the 
selected DNA of the DNA sequences of this invention for expression in a suitable 

20 host. The DNA is operatively.linked in the vector to an expression control sequence 
in the recombinant DNA molecule so that normal and mutant ARMP protein can be 
expressed. The expression control sequence may be selected from the group 
consisting of sequences that control the expression of genes of prokaryotic or 
eukaryotic cells and their viruses and combinations thereof. The expression control 

25 sequence may be selected from the group consisting of the lac system, the trp system, 
the tac system, the trc system, major operator and promoter regions of phage lambda, 
the control region of the fd coat protein, early and late promoters of SV40, promoters 
derived from polyoma, adenovirus, retrovirus, baculovirus, simian virus, 3- 
phosphoglycerate kinase promoter, yeast acid phosphatase promoters, yeast alpha- 

30 mating factors and combinations thereof. 



* 
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The host cell which may be transfected with the vector of this invention may 
be selected from the group consisting of ExolU pseudomonas, bacillus subtillus, 
bacillus stearothermophilus, or other bacili; other bacteria, yeast, fungi, insect, mouse 
or other animal, plant hosts, or human tissue cells. 
5 For the mutant ARMP DNA sequence similar systems are employed to express 

and the produce the mutant protein. 

Antibodies to Detect ARMP 

Antibodies to epitopes with the ARMP protein can be raised to provide 

10 information on the characteristics of the proteins. Generation of antibodies would 
enable the visualization of the protein in cells and tissues using Western blotting. In 
this technique, proteins are run on polyacrylamide gel and then tran&feixed onto 
nitrocellulose membranes. These membranes are then incubated in the presence of 
the antibody (primary), then following washing are incubated to a secondary antibody 

15 which is used for detection of the protein-primary antibody complex. Following 
repeated washing, the entire complex is visualized using colourimetric or 
chemiluminescent methods. 

Antibodies to the ARMP protein also allow for the use of 
immunocytochemistry and immunofluorescence techniques in which the proteins can 

20 be visualized directly in cells and tissues. This is most helpful in order to establish 
the subcellular location of the protein and the tissue specificity of the protein. 

In order to prepare polyclonal antibodies, fusion proteins containing defined 
portions or all of the ARMP protein can be synthesized in bacteria by expression of 
corresponding DNA sequences in a suitable cloning vehicle. The protein can then be 

25 purified, coupled to a carrier protein and mixed with Freund's adjuvant (to help 
stimulate the antigenic response by the rabbits) and injected into rabbits or other 
laboratory animals. Alternatively, protein can be Isolated from cultured cells 
expressing the protein. Following booster injections at bi-weekly intervals, the 
rabbits or other laboratory animals are then bled and the sera isolated. The sera can 

30 be used directly or purified prior to use, by various methods including affinity 
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chromatography, Protein A-Sepharose, Antigen Sepharose, Anti-mouse-Ig-Sepharose, 
The sera can then be used to probe protein extracts run on a polyacrylamide gel to 
identify the ARMP protein. Alternatively, synthetic peptides can be made to the 
antigenic portions of the protein and used to innoculate the animals. 
5 To produce monoclonal ARMP antibodies, cells actively expressing the protein 

are cultured or isolated from tissues and the cell membranes isolated. The 
membranes, extracts, or recombinant protein extracts, containing the ARMP protein, 
are injected in Freund's adjuvant into mice. After being injected 9 times over a three 
week period, the mice spleens are removed and resuspended in phosphate buffered 

10 saline (PBS). The spleen cells serve as a source of lymphocytes, some of which are 
producing antibody of the appropriate specificity. These are then fused with a 
permanently growing myeloma partner cell, and the products of the fusion are plated 
into a number of tissue culture wells in the presence of a selective agent such as 
HAT. The wells are then screened to identify those containing cells making useful 

15 antibody by EUSA. These are then freshly plated. After a period of growth, these 
wells are again screened to identify antibody-producing cells. Several cloning 
procedures are carried out until over 90% of the wells contain single clones which 
are positive for antibody production. From this procedure a stable line of clones is 
established which produce the antibody. The monoclonal antibody can then be 

20 purified by affinity chromatography using Protein A Sepharose, ion-exchange 
chromatography, as well as variations and combinations of these techniques. 

In situ hybridization is another method used to detect the expression of ARMP 
protein. In situ hybridization relies upon the hybridization of a specifically labelled 
nucleic acid probe to the cellular RNA in individual cells or tissues. Therefore, it 

25 allows the identification of mRNA within intact tissues, such as the brain. In this 
method, oligonucleotides corresponding to unique portions of the ARMP gene are 
used to detect specific mRNA species in the brain. 

In this method a rat is anesthetized and transcardially perfused with cold PBS, 
followed by perfusion with a formaldehyde solution. The brain or other tissues is 

30 then removed, frozen in liquid nitrogen, and cut into thin micron sections. The 
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sections are placed on slides and incubated in proteinase K. Following rinsing in 
DEP, water and ethanol, the slides are placed in prehybridization buffer. A 
radioactive probe corresponding to the primer is made by nick translation and 
incubated with the sectioned brain tissue, After incubation and air drying, the labeled 
areas are visualized by autoradiography. Dark spots on the tissue sample indicate 
hybridization of the probe with brain mRNA which demonstrates the expression of 
the protein. 

Antibodies may also be used coupled to compounds for diagnostic and/or 
therapeutic uses such as radionuclides for imaging and therapy and liposomes for the 
targeting of compounds to a specific tissue location. 

Isolation and Purification of E5-1 pmteiq 

The H5-1 protein may be isolated and purified by the types of methods 
described above for the ARMP protein. 

The protein may also be prepared by expression of the E5-1 cDNA described 
herein in a suitable host. The protein is preferably expressed as a fusion protein by 
ligating its encoding cDNA sequence to a vector containing the coding sequence for 
another suitable peptide, eg. GST, The fusion protein is expressed and recovered 
from prokaryotic cells such as bacterial or baculovirus cells or from eukaiyotic cells. 
Antibodies to ARMP, by virtue of portions of amino acid sequence identity with E5- 
1, can be used to purify, attract and bind to E5-1 protein and vice versa. 

Transgenic Mouse Model o f ES-1 related Alzheimer f s Disease 

An animal model of Alzheimer's disease related to mutations of the E5-1 gene 
may be created by methods analogous to those described above for the ARMP gene. 

Antibodies 

Due to its structural similarity with the ARMP, the E5-1 protein may be used 
for the development of probes, peptides, or antibodies to various peptides within it 
which may recognize both the £5-7 and the ARMP gene and gene products, 
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respectively. As a protein homologue for the ARMP, the E5-1 protein may be used 
as a replacement for a defective ARMP gene product. It may also be used to 
elucidate functions of the ARMP gene in tissue culture and vice versa. 

5 Screening for Alzheimer's Disease linked to Chromosome 1 

Screening for Alzheimer's Disease linked to mutations of the £5-2 gene may 
now be conveniently carried out. 

General screening methods are described above in relation to the described 
mutations in the ARMP gene. These described methods can be readily applied and 
10 adapted to detection of the described chromosome 1 mutations, as will be readily 
understood by those skilled in the art. 

In accordance with one embodiment of the invention, the Asnl41Ile mutation 
is screened for by PCR amplification of the surrounding DNA fragment using the 
primers: 

15 1041: 5*-cattcactgaggacacacc (end-labelled) and 

1042: 5 '-tgtagagcaccaccaaga (unlabelled) 

Any tissue with nucleated cell may be examined. Hie amplified products are 
separated by electrophoresis and an autoradiogram of the gel is prepared and 
examined for mutant bands. 
20 In accordance with a further embodiment, the Met239 Val mutation is screened 

for by PCR amplification of its surrounding DNA fragment using the primers: 

1034: 5 ' -gcatggtgtgcatccact and 

1035: 5'-ggaccactctgggaggta 



The same primer sets may be used to detect the mutations by means of other 
methods such as SSCP, chemical cleavage, DGGE, nucleotide sequencing, ligation 
chain reaction and allele specific oligonucleotides. As will be understood by those 
skilled in the art, other suitable primer pairs may be devised and used. 
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The amplified products are separated and an autoradiogram prepared as 
described above to detect mutant bands. 
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In inherited cases, as the primary event, and in non-inherited cases as a 
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secondary event due to the disease state, abnormal processing of E5-1, ARMP, APP 
or proteins reacting with E5-1, APP or ARMP, may occur. This can be detected as 
abnormal phosphorylation, glycosylation, glycation amidation or proteolytic cleavage 
products in body tissues or fluids, eg. CSF or blood. 

Therapies 

An important aspect of the biochemical studies using the genetic information 
of this invention is the development of therapies to circumvent or overcome the 
ARMP gene defect, and thus prevent, treat, control serious symptoms or cure the 
disease. In view of expression of the ARMP gene in a variety of tissues, one has to 
recognize that Alzheimer's Disease may not be restricted to the brain. Alzheimer's 
Disease manifests itself as a neurological disorder which in one of its forms is caused 
by a mutation in the ARMP gene, but such manifest may be caused by the mutations 
in other organ tissues, such as the liver, releasing factors which affect the brain 
activity and ultimately cause Alzheimer's Disease. Hence, in considering various 
therapies, it is understood that such therapies may be targeted at tissue other than the 
brain, such as heart, placenta, lung, liver, skeletal muscle, kidney and pancreas, 
where ARMP is also expressed. 

The effect of these mutations in E5-1 and ARMP is a gain of a novel function 
which causes aberrant processing of (APP) Amyloid Precursor Protein into A0 
peptide, abnormal phosphorylation homeostasis, and abnormal apoptosis. Therapy 
to reverse this will be small molecules (drugs) recombinant proteins, etc. which block 
the aberrant function by altering the structure of the mutant protein, enhancing its 
metabolic clearance or inhibiting binding of ligands to the mutant protein, or 
inhibiting the channel function of the mutant protein. The same effect might be 
gained by inserting a second mutant protein by gene therapy similar to the correction 
of the "Deg 1(d)" and "Mec 4(d)" mutations in C. eJegans by insertion of mutant 
transgenes. Alternately overexpression of wild type E5-1 protein or wild type ARMP 
or both may correct the defect. This could be the administration of drugs or proteins 
to induce the transcription and translation or inhibit the catabolism of the native E5-1 
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and ARMP proteins. It could also be accomplished by infusion of recombinant 
proteins or by gene therapy with vectors causing expression of the normal protein at 
a high level. 

S Rationale for T herapeutic. Diagnostic, and Investigational Applications of the ARMP 
Oetie and Gene Products as Thev Relate to the Amvloid Precursor Protein 

The Afi peptide derivatives of APP are neurotoxic (Selkoe et al, 1994). APP 
is metabolized by passages through the Golgi network and then to secretory pathways 
via clathrin-coated vesicles with subsequent passage to the plasma membrane where 

10 the mature APP is cleaved by a-secretase to a soluble fraction (Protease Nexin II) 
plus a non-amyloidogenic C-terminal peptide (Selkoe et al. 1995, Gandy et al- 1993). 
Alternatively, mature APP can be directed to the endosome-lysosorae pathway where 
it undergoes beta and gamma secretase cleavage to produce the A/3 peptides. The 
phosphorylation state of the cell determines the relative balance of a-secretase 

15 (non-amyloidogenic) or A/3 pathways (amyloidogenic pathway) (Gandy et al. 1993). 
The phosphorylation state of the cell can be modified pharmacologically by phorbol 
esters, muscarinic agonists and other agents, and appears to be mediated by cytosolic 
factors (especially protein kinase Q acting upon an integral membrane protein in the 
Golgi network, which we propose to be the ARMP, and members of the homologous 

20 family (all of which carry several phosphorylation consensus sequences for protein 
kinease C). Mutations in the ARMP gene will cause alterations in the structure and 
function of the ARMP gene product leading to defective interactions with regulatory 
elements (eg. protein kinase Q or with APP, thereby promoting APP to be directed 
to the amyloidogenic endosome-lysosome pathway. Environmental factors (viruses, 

25 toxins, and aging etc) may also have similar effects on ARMP. To treat Alzheimer's 
disease, the phosphorylation state of ARMP can be altered by chemical and 
biochemical agents (eg. drugs, peptides and other compounds) which alter the activity 
of protein kinase C and other protein kinases, or which alter the activity of protein 
phosphatases, or which modify the availability of ARMP to be postranslationally 

30 modified. The interactions between kinases and phosphatases with the ARMP gene 
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products (and the products of its homologues), and the interactions of the ARMP gene 
products with other proteins involved in the trafficking of APP within the Golgi 
network can be modulated to decrease trafficking of Golgi vesicles to the 
endosome-lysosome pathway thereby promoting A£ peptide production. Such 

5 compounds will include: peptide analogues of APP, ARMP, and homologues of 
ARMP as well as other interacting proteins, lipids, sugars, and agents which promote 
differential glycosylation of ARMP and its homologues; agents which alter the 
biologic half-life of messenger RNA or protein of ARMP and homologues including 
antibodies and antisensc oligonucleotides; and agents which act upon ARMP 

10 transcription. 

The effect of these agents in cell lines and whole animals can be monitored 
by monitoring: transcription; translation; post-translational modification of ARMP (eg 
phosphorylation or glycosylation); and intracellular trafficking of ARMP and its 
homologues through various intracellular and extracellular compartments. Methods 

15 for these studies include Western and Northern blots; immunoprecipitation after 
metabolic labelling (pulse-chase) with radio-labelled methionine and ATP, and 
immunohistochemistry. The effect of these agents can also be monitored using 
studies which examine the relative binding affinities and relative amounts of ARMP 
gene products involved in interactions with protein kinease C and/or APP using either 

20 standard binding affinity assays or co-precipitation and Western blots using antibodies 
to protein kinease C, APP or ARMP and its homologues. The effect of these agents 
can also be monitored by assessing the production of A/3 peptides by ELIS A before 
and after exposure to the putative therapeutic agent (Huang et al. 1993). The effect 
can also be monitored by assessing the viability of cell lines after exposure to 

25 aluminum salts and to A/3 peptides which are thought to be neurotoxic in Alzheimer's 
disease. Finally, the effect of these agents can be monitored by assessing the 
cognitive function of animals bearing: their normal genotype at APP or ARMP 
homologues; or bearing human APP transgenes (with or without mutations); or 
bearing human ARMP transgenes (with or without mutations); or a combination of 

30 all of these. 
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Rationale for Therapeutic. Diagnostic, and In vestigational Applications of the ARMP 
(?ene T the ES-1 gene and their products 

The ARMP gene product and the E5- I gene product have amino acid sequence 
homology to human ion channel proteins and receptors. For instance, the E5-1 
5 protein shows substantial homology to the human sodium channel oc-subunit (E=0. 18, 
P=0.16, identities = 22 - 27% over two regions of at least 35 amino acid residues) 
using the BLASTP paradigm of Altschul et al. 1990. Other diseases (such as 
malignant hyperthermia and hypokalemic periodic paralysis in humans and the 
neurodegenerative of mechanosensory neurons in C. elegans) arise through mutations 

10 in ion channels or receptor proteins. Mutation of the ARMP gene or the £5- J gene 
could affect similar functions and lead to Alzheimer's disease and other psychiatric 
and neurological diseases. Based upon this, a test for Alzheimer's disease can be 
produced to detect an abnormal receptor or an abnormal ion channel function related 
to abnormalities that are acquired or inherited in the ARMP gene and its product, or 

15 in one of the homologous genes such as E5-1 and their products. This test can be 
accomplished either in vivo or in vitro by measurements of ion channel fluxes and/or 
transmembrane voltage or current fluxes using patch clamp, voltage clamp and 
fluorescent dyes sensitive to intracellular calcium or transmembrane voltage. 
Defective ion channel or receptor function can also be assayed by measurements of 

20 activation of second messengers such as cyclic AMP, cGMP tyrosine kinases, 
phosphates, increases in intracellular Ca** levels, etc. Recombinantly made proteins 
may also be reconstructed in artificial membrane systems to study ion channel 
conductance. Therapies which affect Alzheimer's disease (due to acquired/inherited 
defects in the ARMP gene or £5-2 gene; due to defects in other pathways leading to 

25 this disease such as mutations in APP; and due to environmental agents) can be tested 
by analysis of their ability to modify an abnormal ion channel or receptor function 
induced by mutation in the ARMP gene or in one of its homologues. Therapies could 
also be tested by their ability to modify the normal function of an ion channel or 
receptor capacity of the ARMP gene products and its homologues. Such assays can 

30 be performed on cultured cells expressing endogenous normal or mutant ARMP 
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genes/gene products or E5-1 genes/gene products. Such studies can be performed in 
addition on cells transfected with vectors capable of expressing ARMP, parts of the 
ARMP gene and gene product, mutant ARMP, E5-1 gene, parts of the E5-1 gene and 
gene product, mutant E5-1 gene or another homologue in normal or mutant form. 
Therapies for Alzheimer's disease can be devised to modify an abnormal ion channel 
or receptor function of the ARMP gene or E5-J gene. Such therapies can be 
conventional drugs, peptides, sugars, or lipids, as well as antibodies or other ligands 
which affect the properties of the ARMP or E5-1 gene product. Such therapies can 
also be performed by direct replacement of the ARMP gene and/or E5-1 gene by gene 
therapy. In the case of an ion channel, the gene therapy could be performed using 
either mini-genes (cDNA plus a promoter) or genomic constructs bearing genomic 
DNA sequences for parts or all of the ARMP gene. Mutant ARMP or homologous 
gene sequences might also be used to counter the effect of the inherited or acquired 
abnormalities of the ARMP gene as has recently been done for replacement of the 
raec 4 and deg 1 in Celegans (Huang and Chalfie, 1994). The therapy might also 
be directed at augmenting the receptor or ion channel function of the homologous 
genes such as the E5-1 gene, in order that it may potentially take over the functions 
of the ARMP gene rendered defective by acquired or inherited defects. Therapy 
using antisense oligonucleotides to block the expression of the mutant ARMP gene 
or the mutant E5-1 gene, coordinated with gene replacement with normal ARMP or 
ES-1 gene can also be applied using standard techniques of either gene therapy or 
protein replacement therapy. 



Protein Therapy 

Treatment of Alzheimer's Disease can be performed by replacing the mutant 
protein with normal protein, or by modulating the function of Jhe mutant protein. 
Once the biological pathway of the ARMP protein has been completely understood, 
it may also be possible to modify the pathophysiologic pathway (eg. a signal 
transduction pathway) in which the protein participates in order to correct the 
physiological defect. 
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To replace the mutant protein with normal protein, or with a protein bearing 
a deliberate counterbalancing mutation it is necessary to obtain large amounts of pure 
ARMP protein or E5-1 protein from cultured cell systems which can express the 
protein. Delivery of the protein to the affected brain areas or other tissues can then 
5 be accomplished using appropriate packaging or administrating systems. 

Gene Therapy 

Gene therapy is another potential therapeutic approach in which normal copies 
of the ARMP gene are introduced into patients to successfully code for normal protein 

10 in several different affected cell types. The gene must be delivered to those cells in 
a form in which it can be taken up and code for sufficient protein to provide effective 
function. Alternatively, in some neurologic mutants it has been possible to prevent 
disease by introducing another copy of the homologous gene bearing a second 
mutation in that gene or to alter the mutation, or use another gene to block its effect. 

15 Retroviral vectors can be used for somatic cell gene therapy especially because 

of their high efficiency of infection and stable integration and expression. The 
targeted cells however must be able to divide and the expression of the levels of 
normal protein should be high because the disease is a dominant one. The full length 
ARMP gene can be cloned into a retroviral vector and driven from its endogenous 

20 promoter or from the retroviral long terminal repeat or from a promoter specific for 
the target cell type of interest (such as neurons). 

Other viral vectors which can be used include adeno-associated virus, vaccinia 
virus, bovine papilloma virus, or a herpesvirus such as Epstein-Barr virus. 

Gene transfer could also be achieved using non-viral means requiring infection 

25 in vitro. This would include calcium phosphate, DEAE dextran, electroporation, and 
protoplast fusion. Liposomes may also be potentially beneficial for delivery of DNA 
into a cell. Although these methods are available, many of these are lower 
efficiency. 

Antisense based strategies can be employed to explore ARMP gene function 
30 and as a basis for therapeutic drug design. The principle is based on the hypothesis 
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that sequence-specific suppression of gene expression can be achieved by intracellular 
hybridization between mRNA and a complementary antisense species. The formation 
of a hybrid RNA duplex may then interfere with the processing/transport/translation 
and/or stability of the target ARMP mRNA. Hybridization is required for the 
5 antisense effect to occur, however the efficiency of intracellular hybridization is low 
and therefore the consequences of such an event may not be very successful, 
Antisense strategies may use a variety of approaches including the use of antisense 
oligonucleotides, injection of antisense RNA and transfection of antisense RNA 
expression vectors. Antisense effects can be induced by control (sense) sequences, 

10 however, the extent of phenotypic changes are highly variable, Phenotypic effects 
induced by antisense effects are based on changes in criteria such as protein levels, 
protein activity measurement, and target mRNA levels. Multidrug resistance is a 
useful model to study molecular events associated with phenotypic changes due to 
antisense effects, since the multidrug resistance phenotype can be established by 

15 expression of a single gene mdrl(MDR gene) encoding for P-glycoprotein. 

Transplantation of normal genes into the affected area of the patient can also 
be useful therapy for Alzheimer's Disease. In this procedure, a normal hARMP 
protein is transferred into a cultivatable cell type such as glial cells, either 
exogenously or endogenously to the patient. These cells are then injected 

20 serotologically into the disease affected tissue(s). This is a known treatment for 
Parkinson's disease. 

Immunotherapy is also possible for Alzheimer's Disease. Antibodies can be 
raised to a mutant ARMP protein (or portion thereof) and then be administered to 
bind or block the mutant protein and its deliterious effects. Simultaneously, 

25 expression of the normal protein product could be encouraged. Administration could 
be in the form of a one time immunogenic preparation or vaccine immunization. An 
immunogenic composition may be prepared as injectable^, as liquid solutions or 
emulsions. Hie ARMP protein may be mixed with pharmaceutical^ acceptable 
excipients compatible with the protein. Such excipients may include water, saline, 

30 dextrose, glycerol, ethanol and combinations thereof. The immunogenic composition 
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and vaccine may further contain auxiliary substances such as emulsifying agents or 
adjuvants to enhance effectiveness. Immunogenic compositions and vaccines may be 
administered parenterally by injection subcutaneously or intramuscularly. 

The immunogenic preparations and vaccines are administered in such amount 
5 as will be therapeutically effective, protective and immunogenic. Dosage depends on 
the route of administration and will vary according to the size of the host. 

Similar gene therapy techniques may be employed with respect to the E5-1 

gene. 

The above disclosure generally describes the present invention. A more 
10 complete understanding can be obtained by reference to the following specific 
examples. These examples axe described solely for purposes of illustration and are 
not intended to limit the scope of the invention. Changes in the form and substitution 
of equivalents are contemplated as circumstances may suggest or render expedient. 
Although specific terms have been employed herein, such terms are intended in a 
15 descriptive sense and not for purposes of limitations. 
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Example 1. Development nf the Geneti c , physical "contig" and transcriptional map 
nf the minimal co-segregating region 

The CEPH MegaYAC and the RPCI PAC human total genomic DNA libraries 
were searched for clones containing genomic DNA fragments from the AD3 region 

5 of chromosome 14q24.3 using oligonucleotide probes for each of the SSR marker 
loci used in the genetic linkage studies as well as ## additional markers depicted in 
Figure la (Albertsen et al., 1990; Chumakov et al., 1992; Ioannu et aL, 1994). The 
genetic map distances between each marker are depicted above the contig, and are 
derived from published data (NK/CEFH Collaborative Mapping Group, 1992; Wang, 

10 1992; Weissenbach, J et al., 1992; Qyapay, Q et al., 1994). Cones recovered for 
each of the initial marker loci were arranged into an ordered series of partially 
overlapping clones ("contig") using four independent methods. First, sequences 
representing the ends of the YAC insert were isolated by inverse PCR (Riley et al. , 
1990), and hybridized to Southern blot panels containing restriction digests of DNA 

15 from all of the YAC clones recovered for all of the initial loci in order to identify 
other YAC clones bearing overlapping sequences. Second, inter-Alu PCR was 
performed on each YAC, and the resultant band patterns were compared across the 
pool of recovered YAC clones in order to identify other clones bearing overlapping 
sequences (BeUamnc-Chartelot et al., 1992; Chumakov et al., 1992). Third, to 

20 improve the specificity of the Alu-PCR fingerprinting, we restricted the YAC DNA 
with Haem or Rsal, amplified the restriction products with both Alu and L1H 
consensus primers, and resolved the products by polyacrylamide gel electrophoresis. 
Finally, as additional STSs were generated during the search for transcribed 
sequences, these STSs were also used to identify overlaps. The resultant contig was 

25 complete except for a single discontinuity between YAC932C7 bearing D14S53 and 
YAC74dB4 containing D14S61. The physical map order of the STSs within the contig 
was largely in accordance with the genetic linkage map for this region (NIH/CEPH 
Collaborative Mapping Group, 1992; Wang, Z, Weber. J.L., 1992; Weissenbach, J 
et al., 1992; Gyapay, G et al., 1994). However, as with the genetic maps, we were 

30 unable to unambiguously resolve the relative order of the loci within the 
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D14S43/D14S71 cluster and thcD14S76/D14S273 cluster. PAC1 clones suggest that 
D14S277 is telomeric to D14S268, whereas genetic maps have suggested the reverse 
order. Furthermore, a few STS probes failed to detect hybridization patterns in at 
least one YAC clone which, on the basis of the most parsimonious consensus physical 
5 map and from the genetic map, would have been predicted to contain that STS. For 
instance, the D14S268 (AFM265) and RSCAT7 STSs are absent from YAC788H12 
(Figure 3). Because these results were reproducible, and occurred with several 
different STS "markers; these results most likely reflect the presence of small 
interstitial deletions within one of the YAC clones. 

10 

Trample 2. Cu m y lfl tive two-pofnt ind scores for chromosome 14q24,3 markers. 

Genotypes at each polymorphic micro satellite marker locus were determined 
by PCR from lOOng of genomic DNA of all available affected and unaffected 
pedigree members as previously described (St George-Hyslop, P et al., 1992) using 

15 primer sequences specific for each microsatellite locus (Weissenbach, J et al. , 1992; 
Gyapay, G et al., 1994). The normal population frequency of each allele was 
determined using spouses and other neurologically normal subjects from the same 
ethnic groups, but did not differ sigmficantly from those established for mixed 
Caucasian populations (Weissenbach, J et al. , 1992; Gyapay, G et al., 1994). The 

20 maximum likelihood calculations assumed an age of onset correction, marker allele 
frequencies derived from published series of mixed Caucasian subjects, and an 
estimated allele frequency for the AD3 mutation of 1:1000 as previously described 
(St George-Hyslop, P et al., 1992). The analyses were repeated using equal marker 
allele frequencies, and using phenotype information only from affected pedigree 

25 members as previously described to ensure that inaccuracies in the estimated 
parameters used in the maximum likelihood calculations did not misdirect the analyses 
(St George-Hyslop, P etal., 1992). These supplemental analyses did not significantly 
alter either the evidence supporting linkage, or the discovery of recombination events. 
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Example 3. Haplotypes between flankin g markers seffreffafed with AD3 in FAD 
pedigrees 

Extended haplotypes between the centromeric and telomeric flanking markers 
on the parental copy of chromosome 14 segregating with AD3 in fourteen early onset 

5 FAD pedigrees (pedigrees NIH2, MGH1, Torl.l, FAD4, FAD1, MEX1, and FAD2 
show pedigree specific lod scores i +3.00 with at least one marker between 
D14S258 and D14S53). Identical partial haplotypes (boxed) are observed in two 
regions of the disease bearing chromosome segregating in several pedigrees of similar 
ethnic origin. In region A, shared alleles are seen at D 14S268 ("B" : allele size = 126 

10 bp, allele frequency in normal Caucasians = 0.04; "C": size = 124 bp, frequency 
= 0.38); D14S277 ("B": size = 156 bp, frequency = 0.19; "C": size = 154 bp, 
frequency = 0.33); and RSCAT6 ("D": size = lllbp, frequency 0.25; "E": size = 
109bp, frequency = 0.20; "F": size - 107 bp, frequency = 0.47). In region B, 
alleles of identical size are observed at D14S43 ("A": size - 193bp, frequency = 

15 0.01; "D": size - 187 bp, frequency = 0.12; "E": size - 185 bp, frequency = 
0.26; T: size = 160 bp, frequency « 0.38); D14S273 ("3": size - 193 bp, 
frequency = 0.38; "4" size = 191 bp, frequency = 0.16; "5": size = 189 bp, 
frequency - 0.34; "6": size = 187 bp, frequency = 0.02) andD14S76 ("1": size 
= bp, frequency = 0.01; "5": size = bp, frequency = 0.38; "6": size = bp, 

20 frequency = 0.07; "9": size = bp, frequency = 0.38). The ethnic origins of each 
pedigree are abbreviated as: Ashk = Ashkenazi Jewish; Ital = Southern Italian; Angl 
= Anglo-Saxon-Celt; FrCan = French Canadian; Jpn « Japanese; Mex = Mexican 
Caucasian; Ger = German; Am = American Caucasian. The type of mutation 
detected is depicted by the amino acid substitution and putative codon number or by 

25 ND where no mutation has been detected because a comprehensive survey has not 
been undertaken due to the absence of a source of mRNA for RT-PCR studies. 

Example 4. Recovery of transcribed sequences from fllff A m interval. 

Putative transcribed sequences encoded in the AD3 interval were recovered 
30 using either a direct hybridization method in which short cDNA fragments generated 



59 



from human brain mRNA were hybridized to immobilized cloned genomic DNA 
fragments (Rommens, JM et al., 1993). The resultant short putatively transcribed 
sequences were used as probes to recover longer transcripts from human brain cDNA 
libraries (Stratagene, La Jolla), The physical location of the original short clone and 

5 of the subsequently acquired longer cDNA clones were established by analysis of the 
hybridization pattern generated by hybridizing the probe to Southern blots containing 
a panel of EcoRI digested total DNA samples isolated from individual YAC clones 
within the contig. The nucleotide sequence of each of the longer cDNA clones was 
determined by automated cycle sequencing (Applied Biosystems Inc., CA) f and 

10 compared to other sequences in nucleotide and protein databases using the blast 
algorithm (Altschul, SF et al., 1990). Accession numbers for the transcribed 
sequences in this report are: L40391, U0392, L40393, L40394, L40395, L40396, 
L40397, L40398, U0399, L40400, L40401, L40402, and L40403. 

15 Example 5. Locating mutat ions in the ARMP gene using restriction enzymes. 

The presence of Ala 246 Olu mutation which creates a Ddel restriction site 
was assayed in genomic DNA by PCR using the end labelled primer 849 (5'- 
atctccggcaggcatatct-3 ') SEQ ID No: 126 and the unlabelled primer 892 (5 ? - 
tgaaatcacagccaagatgag-3') SEQ ID No: 127 to amplify an 84bp genomic exon fragment 

20 using lOOng of genomic DNA template, 2mM MgCl 2 > 10 pMoles of each primer, 
0.5U Taq polymerase, 250 uM dNTPs for 30 cycles of 95°C X 20 seconds, 6QoC X 
20 seconds, 72°C X 5 seconds. The products were incubated with an excess of Ddel 
for 2 hours according to the manufacturers protocol, and the resulting restriction 
fragments were resolved on a 6% nondenaturing polyacrylamide gel and visualized 

25 by autoradiography. The presence of the mutation was inferred from the cleavage of 
the 84bp fragment to due to the presence of a Ddel restriction site. All affected 
members of the FAD1 pedigree (filled symbols) and several at-risk members ("R") 
carried the Ddel site. None of the obligate escapees (those individuals who do not get 
the disease, age > 70years), and none of the normal controls carried the Ddel 

30 mutation. 
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Example 6. Locating mutation in the ARMP yena usinft allele Specific 

oligonucleotides. 

The presence of the Cys 410 Tyr mutation was assayed using allele specific 
oligonucleotides. lOOng of genomic DNA was amplified with the exonic sequence 
5 primer 885 (5Mggagactggaacacaac-3') SEQ ID No: 128 and the opposing intronic 
sequence primer 893 (5'-gtgtggccagggtagagaact-3') SEQ ID No: 129 using the above 
reaction conditions except 2.5 mM MgCl 2 , and cycle conditions of 94°C X 20 
seconds, 58°C X 20 seconds, and 72°C for 10 seconds). The resultant 216bp genomic 
fragment was denatured by 10-fold dilution in 0.4M NaOH, 25 mM EDTA, and was 

10 vacuum slot-botted to duplicate nylon membranes. The end-labelled "wild type" 
primer 890 (5'-ccatagcctgtttcgtagc-3') Seq ID No: 130 and the end-labelled "mutant" 
primer 891 (5*-ccatagcctAtttcgtagc-3') SEQ ID No: 131 were hybridized to separate 
copies of the slot-blot filters in 5 X SSC, 5 X Denhardt's, 0.5% SDS for 1 hour at 
48°C, and then washed successively in 2 X SSC at 23«C and 2 X SSC, 0.1% SDS 

15 at 50°C and then exposed to X-ray film. All testable affected members as well as 
some at-risk members of the AD3 (shown) and NIH2 pedigrees (not shown) possessed 
the Cys 410 Tyr mutation. Attempts to detect the Cys 410 Tyr mutation by SSCP 
revealed that a common intronic sequence polymorphism migrated with the same 
SSCF pattern. 

20 

Example 7. Northern hybridization demonstrating the expression of ARMP protein 
mRNA in a variety of tissues. 

Total cytoplasmic SNA was isolated from various tissue samples (including 
heart, brain and different regions of, placenta, lung, liver, skeletal muscle, kidney 

25 and pancreas) obtained from surgical pathology using standard procedures such as 
CsCl purification. The RNA was then electrophoresed on a formaldehyde gel to 
permit size fractionation. The nitrocellulose membrane was prepared and the RNA 
was then transferred onto the membrane. "P-labelled cDNA probes were prepared 
and added to the membrane in order for hybridization between the probe the RNA to 

30 occur. After washing, the membrane was wrapped in plastic film and placed into 
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imaging cassettes containing X-ray film. The autoradiographs were then allowed to 
develop for one to several days. The positions of the 28S and 18S rRNA bands are 
indicated. Sizing was established by comparison to standard RNA markers. Analysis 
of the autoradiographs revealed a prominent band at 3.0 kb in size. These northern 
5 blots demonstrated the ARMP gene is expressed in all of the tissues examined. 

Example 8: Eukaryotic and Prokaryotic Expression Vector Systems 

Eukaryotic and prokaryotic expression systems have been generated using two 
different classes of ARMP nucleotide cDNA sequence inserts. Li the first class, 

10 termed full-length constructs, the entire ARMP cDNA sequence was inserted into the 
expression plasmid in the correct orientation, and included both the natural 5' UTR 
and 3' UTR sequences as well as the entire open reading frame. The open reading 
frames bear a nucleotide sequence cassette which allows either the wild type open 
reading frame to be included in the expression system or alternatively, single or a 

IS combination of double mutations can be inserted into the open reading frame. This 
was accomplished by removing a restriction fragment from the wild type open reading 
frame using the enzymes Narl and Fflml and replacing it with a similar fragment 
generated by reverse transcriptase PCR and which bears the nucleotide sequence 
encoding either the MetW6Leu mutation or the Hysl63Arg mutation. A second 

20 restriction fragment was removed from the wild type normal nucleotide sequence for 
the open reading frame by cleavage with the enzymes Pflml and Ncol and replaced 
with restriction fragments bearing wither the nucleotide sequence encoding the 
Ala246Glu mutation, or the Ala260Val mutation or the Ala285Val mutation or the 
Leu286Val mutation, or the Leu392Val mutation, or the Cys410Tyr mutation. 

25 Finally, a third variant bearing combinations of either the Metl46Leu or Hisl63Arg 
mutations in tandem with the remaining mutations by linking the Narl-Pflml fragment 
bearing these mutations and the Pflml-Ncol fragments bearing the remaining 
mutations. 

A second variant of cDNA inserts bearing wild type or mutant cDNA 
30 sequences was constructed by removing from the full-length cDNA the 5' UTR and 



62 



part of the 3' UTR sequences- The 5' UTR sequence was replaced with a synthetic 
oligonucleotide containing a Xpnl restriction site and a Kozak initiation site 
(oligonucleotide 969: ggtaccgccaccatgacagaggtacctgcac) SEQ ID No: 138. The 3' 
UTR. was replaced with an oligonucleotide corresponding to position 2566 of the 
5 cDNA and bears an artificial EcoRI site (oligonucleotide 
970:gaaltcactggctgtagaaaaagac) SEQ ID No: 139, Mutant variants of this construct 
were then made by inserting the same mutant sequences described above at the Narl- 
Pflml fragment, and at the Pslml-Ncol sites described above. 

For eukaryotic expressions, these various cDNA constructs bearing wild type 

10 and mutant sequences were cloned into the expression vector pZeoSV (invitrogen). 
For prokaryotic expression, two constructs were made using the glutathione S- 
transferase fusion vector pGEX-kg. The inserts which have been attached to the GST 
fusion nucleotide sequence are the same nucleotide sequence described above 
generated with the oligonucleotide primers 969, Sequence ID No: 138 and 970, 

15 Sequence ID No: 139, bearing either the normal open reading frame nucleotide 
sequence or bearing a combination of single and double mutations as described above. 
This construct allows expression of the full-length protein in mutant and wild type 
variants in prokaryotic cell systems as a GST fusion protein which will allow 
purification of the full-length protein followed by removal of the GST fusion product 

20 by thrombin digestion. The second prokaryotic cDNA construct was generated to 
create a fusion protein with the same vector, and allows the production of the amino 
acid sequence corresponding to the hydrophillic acidic loop domain between TM6 and 
TM7 of the full-length protein, as either a wild type nucleotide sequence (thus a wild 
type amino acid sequence for fusion proteins) or as a mutant sequence bearing either 

25 the Ala285Val mutation, or the Leu286Val mutation, or the Leu392Val mutation. 
This was accomplished by recovering wild type or mutant sequence from appropriate 
sources of KNA using the oligonucleotide primers 989:ggatccggtccacttcgtatgctg SEQ 
ID No: 140, and 990:ttttttgaattcttaggctatggttgtgttcca SEQ ID No: Ml. This allows 
cloning of the appropriate mutant or wild type nucleotide sequence corresponding to 

30 the hydrophillic acid loop domain at the BamHI and the EcoRI sites within the pGEX- 
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KG vector. 

These prokaryotic expression systems allow the holo-protein or various 
important functional domains of the protein to be recovered as fusion proteins and 
then used for binding studies, structural studies, functional studies, and for the 
generation of appropriate antibodies. 

Sample 9l Identification of Three New Mutations in the ARMP Gene 

Three novel mutations have been identified in subjects affected with early 
onset Alzheimer's disease. All of these mutations co-segregate with the disease, and 
are absent from at least 200 normal chromosomes. The three mutations are as 
follows: a substitution of C by T at position 1027 which results in the substitution 
of alanine 260 for valine; substitution of C by T at position 1102, which results in the 
substitution of alanine at 285 by valine; and substitution of C by G at position 1422 
which results in the substitution of leucine 392 by valine. Significantly, all of these 
mutations occur within the acidic hydrophillic loop between putative TM6 and TM7, 
Two of the mutations (A260V; A285V) and the L286V mutation are also located in 
the alternative spliced domain. 

The three new mutations, like the other mutations, can be assayed by a variety 
of strategies (direct nucleotide sequencing, Allele specific oligos, ligation polymerase 
chain reaction, SSCP, RFLPs) using RT-PCR products representing the mature 
mRNA/cDNA sequence or genomic DNA. We have chosen allele specific oligos. 
For the A260V and the A285V mutations, genomic DNA carrying the exon can be 
amplified using the same PCR primers and methods as for the L286V mutation. PCR 
products were then denatured and slot blotted to duplicate nylon membranes using the 
slot blot protocol described for the C410T mutation. 

The Ala260Val mutation was scored on these blots by using hybridization with 
end-labeled allele-specific oligonucleotides corresponding to the wild type sequence 
(994:gattagtggttgttttgtg) SEQ ID No: 142 or the mutant sequence 
(995:gattagtggctgttttgtg) SEQ ID No:143 by hybridization at 48<>C followed by a wash 
at 52°C in 3X SSC buffer containing 0.196 SDS. The Ala285Val mutation was 
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scored on these slot blots as described above but using instead the allele-spccific 
oligonucleotides for the wild type sequence (1003:tttttccagctctcattta) SEQ ID No: 144 
or the mutant primer (1004:tttttccagttctcattta) SEQ ID No: 145 at 48<>C followed by 
washing at 52°C as above except that the wash solution was 2X SSC. 

5 The Leu392Val mutation was scored by amplification of the exon from 

genomic DNA using primers 996(aaacttggattgggagat) SEQ ID No: 147 and 893 
(gtgtggccagggtagagaact) SEQ ID No: 129 using standard PCR buffer conditions 
excepting that the magnesium concentration was 2mM and cycle conditions were 
94°C time 10 seconds, 56<>C times 20 seconds, and 72°C for 10 seconds. The result 

10 200 based pair genomic fragment was denatured as described for the Cys410Tyr 
mutation and slot-blotted in duplicate to nylon membranes. The presence or absence 
of the mutation was then scored by differential hybridization to either a wild type end- 
labelled oligonucleotide (999:tacagtgttctggttggta) SEQ ID No: 146 or with an end- 
labeled mutant primer (100:tacagtgttgtggttggta) SEQ ID No: 148 by hybridization at 

15 45°C and then successive washing in 2X SSC at 23°C and then at 68°C. 

Example 10: Polyclonal Antibody Production 

Peptide antigens were synthesized by solid-phase techniques and purified by 
reverse phase high pressure liquid chromatography. Peptides were covalently linked 

20 to keyhole limpet hematoxylin (KLH) via disulfide linkages that were made possible 
by the addition of a cystein residue at the peptide C-terminus. This additional residue 
does not appear normally in the protein sequence and was included only to facilitate 
linkage to the KLH molecule. A total of three rabbits were immunized with peptide- 
KLH complexes for each peptide antigen and were then subsequently given booster 

25 injections at seven day intervals. Antisera were collected for each peptide and pooled 
and IgG precipitated with ammonium sulfate. Antibodies were then affinity purified 
with Sulfo-link agarose (Pierce) coupled with the appropriate peptide. This final 
purification is required to remove non-specific interactions of other antibodies present 
in either the pre- or post- immune serum. 

30 The specific sequences to which we have raised antibodies are; 
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Polyclonal antibody 1: NDNRERQEHNDRRSL (C)- residues 30-45 
Polyclonal antibody 2: KDGQLIYTPFTEDTE (Q- residues 109-120 
Polyclonal antibody 3: EAQRRVSKNSKYNAE (C)-residues 304-319 
Polyclonal antibody 4: SHLGPHRSTPESRAA (Q-residues 346-360 
5 The non-native cysteine residue is indicated at the C-terminal by (Q. These 
sequences are contained within various predicted domains of the protein. For 
example, antibodies 1, 3, and 4 are located in potentially functional domains that are 
exposed to the aqueous media and may be involved in binding to other proteins 
critical for the development of the disease phenotype. Antibody 2 corresponds to a 
10 short linking region situated between the predicted first and second transmembrane 
helices. 

Example 11 ! Identification of two mutations in E5-1 gene 

RT-PCR products corresponding to the £5-7 ORF were generated from RNA 

15 of lymphoblasts or frozen post-mortem brain tissue using oligonucleotide primer pain 
1021:5'-cagaggatggagagaatac and 1018:5'-ggctccccaaaactgtcat (product = 888 bp); 
and 1071:5'-gccctagtgtteatcaagta and 1022: 5'-aaagcgggagccaaagte (product = 826 
bp) by PCR using 2S0 /*Mol dNTPs, 2.5 mM MgC12, 10 pMol oligonucleotides in 
10 fd cycled for 40 cycles of 94<>C X 20 seconds, 58°C X 20 seconds, 72°C X 45 

20 seconds. The PCR products were sequenced by automated cycle sequencing (ABI, 
Foster City, A) and the fluorescent chromatograms were scanned for heterozygous 
nucleotide substitutions by direct inspection and by the Factura (ver 1.2.0) and 
Sequence Navigator (ver 1.0. Ibl5) software packages (data not shown). 

Asnl41Ile: The A-»T substitution at nucleotide 787 creates a Bell restriction 

25 site. The exon bearing this mutation was amplified from 100 ng of genomic DNA 
using lOpMol of oligonucleotides 1041: 5'-cattcactgaggacacacc (end-labelled) and 
1042: 5'-tgtagagcaccaccaaga (unlabelled), and PCR reaction conditions similar to 
those described below for the Met239Val. 2/xl of the PCR product was restricted 
with Bell (NEBL, Beverly, MA) in 10 /tl reaction volume according to the 

30 manufacturers' protocol, and the products were resolved by non-denaturing 
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polyacrylamide gel electrophoresis. In subjects with wild type sequences, the 114 bp 
PCR product is cleaved into 68 bp and 46 bp fragments. Mutant sequences cause the 



amplification from 100 ng of genomic DNA using PCR (10 pMol oligonucleotides 

1034: S'-gcatggtgtgcatccact, 1035: S'-ggaccactctgggaggta; 0,5TJTaqpolymerase, 250 
dNTPS, IfiCi alpha 32 P-dCTP,. 1 .5 mM MgCl*, 10 pd volume; 30 cycles of 94°C 

X 30 seconds, 58<>C X 20 seconds, 72°C X 20 seconds) to generate a 110 bp 
10 product, 2 fd of the PCR reaction were diluted to 10 fd and restricted with 3 U of 

NlalU (NEBL, Beverly, MA) for 3 hours. The restriction products were resolved by 

non-denaturing polyacrylamide gel electrophoresis and visualized by autoradiography. 

Normal subjects show cleavage products of 55, 35, 15 and 6bp, whereas the mutant 

sequence gives fragments of 55, 50 and 6 bp, > 
15 Although preferred embodiments of the invention have been described herein 

in detail, it will be understood by those skilled in the art that variations may be made 

thereto without departing from the spirit of the invention or the scope of the appended 

claims. 
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product to be cleaved into 53 bp, 46 bp and 15 bp. 

Met239Val: The substitution at nucleotide 1080 deletes a Nlam 
restriction site, allowing the presence of the Met 239Val mutation to be detected by 
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AACACA7GAA AGAAAGAACC .TCAAGAGGC7 77Q7T77CTG 7GAAACAG7A TTTCTATACA 
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G770C7CCAA 7GACAGAG77 ACC7GCACCG 77G7CC7AC7 7CCAGAATCC ACAGA7G7C7 
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AAC7C2CGGC AGC7GG7GGA GCAAGA7GAG GAAGAACA7G ACGAGC7CAC AT7GAAA7A7 
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GGCCCCAAGC A7C7CA7CA7 GC7C777G7C CC7C7CACTC 7C7GCATGG7 C-TT GG7CG7G 

5=0 560 570 580 590 600 

CC7ACCA77A AG7CAG7CAG CTTTTATACC CGGAAGGA75 CCCACC7AA7 CTATACCCCA 
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TTCACAGAAG A7ACCGAGAC 7G7GGGCCAG AGAGCCC7GC ACTCAATTCT GAA7GC7GCC 
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A7GCCAGAAG CAGACCCCGA AGC7CAAACC ACAG7A7CCA AAAA77CCAA G7A7AA7GCA 

L2I0 1220 1230 124Q 1250 1250 

GAAAGCACAG AAAGC-C-AG7C ACAACACAC7 C77SCACAGA A7GA7GATGG CGGGTTCAGT 

1270 1230 1220 1300 U10 1320 

GAGGAA75GG AAGCCCAGAG GCACAC7CA7 C7ACGGCC7C A7CGC7CTAC ACCTCAC7CA 

1330 ' 1240 13S0 1350 1570 1380 

CGACC7CC7G 7CCAGGAAC7 77CCAGCAG7 ATCI7CGC7G G7GAACACCC AGAGGAAAGG 

1390 H00 1410 1420 1430 1440 

GGAG7AAAAC 77GGA77GGG AGATTTCA7T T7C7ACAG7G TTC7GG77GG TAAAGCC7CA 

145Q 1460 14 T 0 1480 1490 1500* 

GC AACAGCCA G7GGAGAC7G GAACACAACC A7AGCC7G7T TCG7AGCCAT AT7AA77GG7 
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TTG7GCC77A CA7TA77AC7 CC77GCCA77 T7CAAGAAAG CA7TGCCAGC TC77CCAA77 
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TCCATCACC? T7GGGCTTG7 T77C7AC77T CCCACAGA77 ATCT7G7ACA GC ATG 
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GACCAA7TAG CA77C2ATCA A7777A7A77 TAGCA7ATTT GCGG77AGAA TCCCA7GGAT 
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GTT7C77C77 TGAC7A7AAC CAAATC7GGG GAGGACAAAG G7GA7777CC TC7G7CCACA 

17SC 1750 1770 1780 1790 1300 

TCTAACAAAG TCAAGATTCC CGGC7GGAC7 T77GCAGC7T CC77CCAAG7 CTTCC7GACC 

1810 1320 1830 1840 1350 1850 

ACCTTGCACT AT7GGACTT7 CGAAGCACG7 CCC7A7AGAA AACGAT777G AACATACTTC 

1870 1880 1390 1900 1310 1920 

ATCGCAGTGG ACTGTGTCC7 CGGTGCAGAA AC7ACCAGAT TTGAGCGACG AGG7CAAGGA 

193G 1940 1930 19S0 1970 1930 

GATATGATAG GCCCGGAAG7 TGCXGTGCCC CA7CAGCAGC TTCACSCGTG CTCACAGGAC 

1990 2000 2010 2020 2030 2040 

GA77TCACTG ACACTGCGAA CTCTCACGAC TACCGC7TAC CAAGAGG7TA GG7GAAG7CC 

20S0 2050 2070 2030 2090 2100 

T77AAACCAA ACGGAAC7CT TCA7C77AAA C7ACACG77G AAAATCAACC CAA7AAT7C7 

2110 2120 2130 2140 2150 2150 

G7A7TAAC7G AAT7CTGAAC T7TTCAGGAG G7AC7G7CAC CAACACCAGG CACCAGCAGC 
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2i7c 2i3c z~ m *n :-ao 22:0 222a 

ACAArGGGCA ATGGAGAGG7 GGGCA0CGG7 77CAGCTT CC CTTTJATTTT 77GCTGCAGA 

2220 3240 2250 2250 2270 2230 

CTCA-CCTTT T7 AAA 7 GAGA C7757TTTCC CC7C7C777C AG7CAAG7CA AA7A7~ACA 

7290 2300 2310 2320 2330 2340 

TGC C rrTCCC AA7TC77C7T C7CAAGCACT GACAC7CA7T ACCG7C7G7G AT7GCCAT7T 

23=C 2360 2370 2390 2330 2400 

C7TCCCAAGG CCAG7C7CAA CC7GAGC77G C777ATCCTA AAAG7TTTAA CCTCAGGTTC 

24;0 . 2429 2430 2440 2450 2450 

CAAATTCAGT AAA7777GGA AACAG7ACAG C7A777C7CA TCAATTCTCT A7CA7G77GA 

2470 2430 24?0 2500 2510 2520 

AG7CAAA77T OGA7TT7CCA CCAAA7TC7G AA777G7AGA CATAC77G7A CCC7CAC775 

2530 2540 23 = 0 2560 2570 2530 

CCCCAGA7GC C7CC7C7G7C C7CA77C77C TC7CGCACAC AAGCAGTCTT TT7CTACAGC 

2S90 2600 2SL0 2620 2630 2=40 

CAG7AAGGCA GC7C7G7CG7 GG7AGCAGAT GG7CCCAC77 AT7C7AGGG7 CT7AC7C7T7 

2650 2660 257Q 2530 2630 2700 

G7A7GATGAA AAGAATGTGT 7ATGAA7CSC 7GC7GTCAGC CC7CC7G7CA GACCT7C77C 

2710 2720 2730 2740 2750 2750 

CACAGCAAA7 GAGA7G7ATG CCCAAAGCGC TACAA77AAA GAACAG7AAA A7GGC7G7T5 



2770 2780 2790 2800 2910 2320 
AAGCAAAAAA AAAAAAAAAA AAAAAAAAAA A 



- -iDTETVCORAI. 



FMDQ LAJEQF YI 




,„ SO 90 -"° 110 110 

csucacaag ACACcAGC=r rxzAc^r raaaer tc=ac=acaa 

-■•a 140 iso ISO i-?o xao 

i*c«Aci: aaasaajccc WGACcrrrr u....l...- ACAAc=rA=? icmcaec 

■ 1M 200 210 220 230 240 

zz££ «s«s«aSc.c=3ac=7rr g^act^: ca«*= ktoi 



r 



I , ea " 260 270 230 . 290 300 

JoaccSc tccagcagcs ccarccssM gagaasgac agccaacaac rococo. 

I i;i3 330 340 350 350 

acwracaS cvsASurtts acaac==« g=aa?a2C7 aatcsscxs: ctcacagtaa 



i 370 380 330 4C0 410 «" 

C^ACACAC ST5373SAAC AACAJ3AG3A G5AAGAC5AA GAGCT3ACC 

I 430 440 4S0 ««*^ «Z2^ J" 

icsyca: ccatcsscc tcttmtccc csracc=rc tsxatc tc—ssss. 

Sfla «15 J20 330 540 

7=T*=VC==3 GAACSACS57 CAflC=AATCT ACkCCCaTT 

j ,. fl sfi3 57Q S30 530 600 

<^gaa«c Acrtacacr: -»g==uag AcrtrrscAC rc»sc=?» atcc~-x: 

I rc «o no 540 «o wa 

ca«at=act cr a gjaa o . rrwcaccxr u-ujasa urn — ujaa 

S73 630 690 "0 710 J20 




<Lag 



673 * oo ojv — 

MOOSe ASCS2C5SS7 CKXTXTST KCASC^CIG TWPTfc— * 

I in/1 740 750 760 770 780 

issaarac tsmgsoac a^r=u<ac caoxsx rccsreacr Acsraac? 

n g00 310 820 8^0 

jJxackSa atcts*s? ggsstcwSt c^r^rr a=AiccA« oaaacsccs 

I „<„ fl«o 870 830 850 900 

CAGCAGScS AlCIC^ATSCC^ CTCATGGCO: TTX7TATTTA.T 

T 310 920 "0 940 950 *S0 

CUGTACCTC C^J^ C==^CCT CAIC^CT CTGAmAG TACATCATI? 

a ga S3Q WOO 1010 1020 

JasesJm tokic£ca aaggc=cact tes a auJJ ckcaaacag ctcagsaag 



LiS CCCAA^ C=7AC=SS ««SS 

I r so 1160 U70 USO U3Q uao 

A&gz—m: agacacacac accacagtcs txtcc^ac cakaicstc ccticactsa 

I 1213 1220 1330 1240 1M0 

SAC CCCCAAACAC ACAGTCACC? OCGCCCrCAC C=CX=ACTC AGTCAAG 




UIZ 1250 12QQ 1310 13 2C 

iAACTTT Cr^X^CAr TCTAACAGT <aAG^ CT:~ ACGAAACAC3 

13 i 0 1250 * 1250 1370 138C 

act aaaact? ccactsgcac ArrrcArrrr crocrTrr crccrrsna aggccxagc 

! ^30 140C 1410 1420 1430 1440 

AACCSCCAGT (XACACXCA ACACUCCA? ACCCTCCTTC G7AGCCATAC TGACCSGCC? 

! 1450 1450 1470 1430 1430 1500 

q^crrrA>zA tzaqcticc tccccaxtta cwcaaaks ^gccacccc w^^xatctc 

1520 ' 1530 1540 1550 1550 

:3TOT IC ^rXJC C7T37XAGC CCTXATKA 

1S70 1530 1590 1SC0 1*10 1*30 

CCAACT^XA rrcrATCACT TtTACATCIA U 'IV-iU: ACTTACAACA TCGATTTTTC 

I 1530 1540 1550 1550 1570 1530 

— ArTAAAAiCA CAAAAACAGA GACCAASCCC GAGSAGSAGA CT3G7CACT? 




r 



1590 1700 1710- 17:0 1730 1740 

rSwTC C TtlVCCmACA AAGGCAGGAC TtTAGCTCGA CTTCrCCAGC v;— i---Z3A 

17«0 1750 1770 1730 1730 1300 

1C C3CC3CAC T^CT3GACTG 7GSUGSUO CSTC^OCA GCAACSSTTT 

1*10 1320 1330 1840 13S0 1350 

ccaacaicca ' iiv-Tr ' - vr* acacsstgtc ccrzaaiaac ctsagacscv acgacaagsa 
I 1870 iaao 1330 1500 . mo 2320 

AATGTOCIGG OCCAAG3AGC JL^w t' X ?: TCCtACCTTT CSTCECG3 CCTSCATT 
J 1330 1340 153fl 1550 1970 1980 

sacccscu: 



10 IS 30 40 SO 50 

3:?A?L2T TZ&GGzDS :-2S£C3SCl 2£CSCCC>3 aC2U^?Z?r SSTOCSNSX 



V. 

/. 70 , 30 30 120 HO 120 

gvvMzsa znnrr^AA xvtc^vtvt zoiwwATr xstsftzuq sg mv r :* 



! 130 140 150 150 170 130 

chrr^saAL si^a^i avrvamx. wtrmcnc vuofrxriss r,r.r.r.7T7sn 

j 150 20Q 310 220 330 240 

?LzrJtk-:x v*vrr;r/Ai i^wgvvw XAswcsLa lqqa2L2C3 aumttsz? 
i 

| 25C 2(0 270 230 250 300 

I3S7taWLIL AVTSVKX.VA VLC3S57I3K TUPALTTS3 COWLTOAS 

I 310 320 330 340 350 360 

<S?iACCCV7 JOT^IT^IA Z3T?5rSG3C XCCSTSGW ZACTCSSLG? H33TPSS3AA 

j 37C 380 390 ' 4C0 ' 410 420 

^djGStir ZJZXZTlTiS VLVCMATA SCZXOTTiaC X7AILICLCL 

I 430 UC 420 460 470 420 
TtTTTT.iTTTT SOAXJIjZT FGTVT^TAZ: TlTQprMC^ A?HQrE» 



J ■": 20 30 ;0 = 0 

\ CAATTCGGCA COAC^-AAAT GCTCr~3CT C'AACACCTC 7CACCGCGCA CGTGC-..^ 

i * : 70 30 30 -™ 120 

. GCCGGGA7TA GTAGC'GTCT GAACTGCAG7 CCAGTAGGAG AAAGAGCAAC CGT^.TGGGC 

4 ina no l«-2 1:0 170 130 

TCCCTCT'JCT 7CACCAAC7G CTCAAACTCC CCGCCTCACC C«CGGG7GT GTCC7TG.-C 

: c 0 2C0 210 2^0 230 2*3 

AGGGGCGACG AGCAT7C7GG GCGAAG7CGG CAC3CC7C77 G77CGAGCCG GAAGACGGGG 

2S0 2$0 2T0 230 290 300 

TC7CA7SCT7 7C7CC77GG7 CGGGKC7C-7C TCSAGGCATG CATC7CCAGT GAC7C77G7G 

310 320 330 340 350 3*0 

rrrccrccra err ccrrcrc agattcttct caccgttgtg g7Cagc7Ctg ctttaggca? 

370 330 350 400 410 420 

TA77AATCCA TAG7GGAGGC 7GGGA7SGG7 CAGAGAA77G AGG7GAC777 7CCA7AA. -C 

4 i 0 440 4=0 -ISO 470 430 

AGACC7AATC 7GGGAGCC7G CAAG7CACAA CAGCC777GC G07CC77AGA CAGCTTGGCC 

430 500 5ia 320 530 ^340 

TGGAGGACAA CACA7GAAAG AAAGAACGTG AAGAGGC777 G7777C7G7G AAACAGTA. . 

530 550 =70 530 590 500 

TC7A7ACAG7 7GC7CCAA7S ACAGAG77AC CTCCACCG77 G7CC7AC77C CAGAA7GCAC 

3 'L 510 520 530 €40 530 550 

I ^ AGAXG7C7GA GGACAACGAC C7GACCAATA CTfcATGACAA TAGAGAACGG CAGGAGCACA 

I 6 70 530 530 7Q0 710 720 

ACGACAGACG GAGCCTTGGC CACCCTGACC CA77A7C7AA TGGACGACCC CAGG^AACT 

730 740 750 750 770 -730 

CCCGGCAGGT GC7GGAGCAA CATGAGGAAG AAGA7GAGGA GC7GACAT7C AAATATGGCG 

790 800 813 820 830 340 

CCAAGCATGT GA7CATGCTC TTTGTCGCTG TGACTC7C7G CATCG7GG7G G7CG7GGC7A 

BS0 850 870 880 890 900 

CCATTAAGTC AGTCAGCTTT TATACCCGCA AGCATGGGCA GCTAATCTAT ACCCCAT7CA 

910 920 930 940 950 960 

CACAAGATAC CGAGACTGTG CGCCACACAG CCC7GCAC7C AATTCTCAAT CC7CCCA7CA 

970 980 990 1000 1010 1020 

7GA7CAG7G7 CAT7G77G7C ATGAC7A7CC TCCTCG7GG7 TCTG7A7AAA TACAGG7GC7 

-030 1040 1050 1060 1070 1089 

ATAAGG7CAT CCATCCCtCG C 77 A 77 A 7 AT CATCTCTA7T C7TCC7GT7C 1TI i . XAT 



;om ::og :::o ::;-> ;:3o 

T7ATT7AC7T CGGGGAAG7G TTT.V^ACrT A7AACT77GC 7-}7TCAC7AC ATT AC7C 77S 



115-3 1150 1170 ;;hc 

CAC7T27GA7 ctcgaatttg ^tct^tco caa7catt7c cat: 



1130 1230 
AAAGGT72AC 



12 1!) 1220 1220 1243 

CCACCCA7A7 C7CAC7A72A 77AG73C2C7 CAT: 



77G G7G7T7A7CA 



lira 1230 , 1250 

ACTACTTCCC 7SAA7GGAC7 CCG7GCC7CA TS 



13*33 L310 1320 

7CGC7G7 GA777CAG7A 7ATC-A7T7AG 



1330 1343 
TC-GC7C7TT7 CTGTCCGAAA C-C: 



13 SO 1350 1370 1330 

G7A7GCTGG7 TCAAACAGCT CAGCACAGAA 



1390 

A7GPAACGC7 



1400 1413 
> • G * G ^ • * ^^C7 



1420 1430 1440 

"AACAA7 GG7G7GG77G G7SAA7A7GG 



1450 
CACAAGCACA 



UoO 1470 1430 

rr CAAACCACAG TA7TTAAAAA T. 



1490 1500 
;7 AA7GCAGAAA 



ifi: 



1523 



1530 



1540 



15 = 3 



CCACAGAAAG GCAG7CACAA CACAC7G77G CACAGAA7GA 7GATC 



ifa*: 

T7CAG7GAGG 



1570 1S50 ISrO 1500 1510 1520 

AA7GC-GAAGC C2AGACGGAC AG7CA7C7AG GGC27TATTG C7C7ACACGT GAG7CACGAG 



1530 1=40 1550 1550 1570 1530 

C7GC7G7C2A C-3AAC777CC AGCAG7A7CC TCGCTGG7GA AGACG2AC-AG GAAAGGGGAG 

1550 1700 1710 1720 1730 1740 

TAAAAC7TGG AT7GCCAGA7 T7CAT7T7CT ACAC7C77CT GG77CG7AAA GCC7CACCAA 

1750 1750 1770 1780 1790 1900 

CAGC7AG7CG AGAC7CGAAC ACAACCA7AC CC7G7T7GG7 AGC2ATA7TA A7TGGT77G7 

1310 1320 1330 1940 1350 1950 

CCC77ACAT7 ATTAC7CCT7 GCCA7TTTCA AGAAAGCA77 GC2AGCTC77 CCAA7C7CCA 

1370 1330 1990 1500 1910 1920 

7CACC7TTCG GCTT GTTTT C TACTTTGCCA CACA77A7C7 TG7ACAGCC7 777ATGGAC2 

1530 1940 1550 1550 1970 1930 

AAT7AGCA7T CCATCAA77T TA7A7C7AGC A7AT7TGCGG T7AGAA7CCC A7GGATG777 

1990 2000 2010 2020 2030 2040 

CTTC777CAC 7ATAACCAAA TC7CGGCAGG ACAAAGCTGA TT77C27G7G 7CCACATC7A 

20S0 2050 2O70 2080 2090 2100 

ACAAAG7CAA GA77CCCCCC 7GGAC7T77G CAGC7TCC77 CCAAG7C77C C7GACCACG7 

2110 2120 2130 2140 2150 2150 

TCCAC7AT7G CAC777GGAA CGACCTCCCT A7AC-AAAACG A77T7GAACA TAC777ATTG 



0 



m _ _ --*o :::: 

'^AvjkCCACT^ Tu * CC7CSC7 GCAGAAAC7A -77A£A ». - - 7A OCCACjACOT CAA0CACA7A 

:::o _; :i0 22 -' : ;:5c) iirc 2290 

7tU7AGCCCC GCAACi, .GC7 07.""£A7C AGCAGCTTGA CGCG7507CA CAGc;AC3A77 

22?C 2200 2310 2320 2J30 2340 

TCACTGACAC 7-CGAAC7C7 CAGCAC7ACt CC77ACCAAG AGC77AGG7G AAG7 UCTTTT A 

23=0 2360 2J7Q 2J80 2350 2400 

AACCAAACGG AAC7C77CA7 CTTAAACTAC ACG775AAAA TCAACCCAAT AATTC7C7A7 

3413 * 2420 243S 2440 2-450 2450 

7AAC7GAA77 C7GAAC7777 CAGGAGG7AC T07GAGGAAG AGCAGCCACC ACCAGCACAA 

2470 2430 2490 2500 2510 2520 

TGCGGAA7GG AGAGG7GGCC ACGCC77CCA GC77CCC777 GA77777TGC TGCAGAC7CA 

25'- 2-40 2S50 2550 2570 2530 

TorrrrrrAA atgagacttg ttttccgctc tctttgagtc aagtcaaata tgtagai 



: IjFf 2S5G 2500 2510 2S20 2530 2540 

: l'h 7775GCAA77 CT7C7TC7CA AGCAC7GACA C7CA77ACCG 7C7G7GA77G CCAT7TC77C 

! 'jfi 

| if 2550 2550 2570 2530 2530 2700 

j % CZXkGGCCXC TC7SAACC7G AGC-77 0C777 A7CC7AAAAG T7T7AACC7C AGGTTCCAAA 

1^3 2710 2720 2730 2740 2750 2750 

: ij T7CAG7AAA7 777CGAAACA G7ACACC7A7 TTC7CATCAA T7C7C7ATCA 7G77GAAGTC 

A 2770 27SQ 2790 2300 2810 2820 

flf AAATTTCGA7 777CCACCAA ATTC7CAAT7 TG7AGACA7A C77S7ACGCT CACTTGCCCC 

2330 2S40 2850 2360 2870 2880 

AGA7SCC7CC TC7C7CC7tA TTCTTCTCTC CCACACAAGC AG7C7777TC TACAGCCAGT 

2990 2SC0 2310 2920 2930 2940 

AACGCAGC7C TG7CG7GGTA CCACA7GG7C CCAC7TA77C TACGC7CTTA CTCT7TCTAT 

2950 2950 2370 2980 2990 3000 

CA7GAAAAGA ATG7GTTATG AA7CGGTGC7 G7CAGCCCTG CTGTCAGACC TTC77CCACA 

3010 3020 3030 3040 3050 3050 

GCAAA73AGA TG7A7GCCCA AAGCGGTAGA AT7AAAGAAG AG7AAAA7GG C7CT7GAAGC 

3070 3080 3090 3100 3110 3120 
AAAAAAAAAA AAAAAAAAAA AAAAAAA 




8&? — 392 .c<»r. 



10 20 30 4a SO 60 

GTT^rranAA ccaactxags achtxccac:: xsccsaacac oac^xcaxc xccsocacs;! 

70 80 90 100 110 120 

AAACACTnCA GTTCACCCSX CATTCCACCZ ACTTTACTCC AACCCXGCCC AACZAAAAX3 

130 140 ISO 160 170 130 

ACACACX«C tCCAAACACA AAAACAAAAA CAAAAAAAGA GXAAAXXAAX XXAffACSCAA 

190 2CC 210 220 230 240 

CNATTAAATA AATAATAGCA CAGXTCAXAX AGCX7AXGGT AAAAX7AXAA ACCTSCCANA 

250 260 270 280 290 3G0 

TXAAXAXCXA AXGXXXCCCA CCCAXCACAX XA7IC7AAAX AAXGXXXX5G XCCAAAXXAX 

310 320 330 340 350 360 

tcxacaxcxx xtaaaatcxt; xcxaaxtxtx xxtcagcgaa gtcxtxaaaa ccxaxaacsx 

370 380 390 400 410 420 

XCCXCXGCAC TACATTACTC XTMCACXCCT CAXCTSSAAX X7XCSXUXGG XGGGAAX5AX 

430 440 4S0 460 470 480 

XTCCAXTCAC XCCAAACCXC CACXXCSACX CCACCACGCA XAXCXCAXTA XGAXTAGXGC 

490 500 $10 520 330 540 

cctcatcticc cxosctsxtxa xcaagxacct cccxsaatcg acxshcxcgc xcaxcxxccc 

550 560 570 580 590 600 

TGXCAXX7CA GTAZATCGXA AAACXCAACA CX5AXAAXTT CXXXSXCACA CGAAXCCCCC 

610 620 630 640 830 660 

AC7CCAGTGT rTTCXTXCCX CVXCXC7TTA XCXXCATXXA GACAAAATSC XAACSTSTAC 

670 630 630 700 710 720 

AXCCCAXAAC TC7TCAGXAA ATCATTAATT ACCXAXAGXA ACXXTXTCAX XX5AACAXCX 

730 740 750 760 770 780 

CCCCXCGCA TCSTACCTCA XGCC7S7AAX CXTACCACTT TCGSACCCXC ACCC5CSCAG 

790 800 810 820 830 840 

AXCACCXAAC CCCACAGTTC AACACCACCC XCCCCAACAX CCCAAAACCX CSXAXCXACA 

850 860 870 880 890 900 

CAAAAXACAA AAATTACCC5 GCCAXCCXGG XGCACACTXC TAGTXCCAGC XACXXAGGAG 

910 920 930 940 950 960 
GCXCACCXCC CACCAXCCAX TGAXCCCAGS AGCXCAACTC XCCAC 



10 20 30 40 50 60 

g— ccaaagt cArMArrcc tttacctacc tacattaxca acctttttca gaataaaaxg 

7C 80 90 100 1*0 120 

AA— ACAC7 GTTACAGTCr AATTCTATAT CACCG7AAC ATATATCAGT 

130 140 150 160 170 130 

AATACTscrr rrrrrrrrrr rr— rrrrr r.^iuxx tttccccaka cactctcgc? 

190 2C0 210 220 230 240 

CTGTCSCCAC GTTCGACTCC AATCGTSCGA TCTTGCCTCA CTGAAACCTC CACCSCCC3S 

250 260 270 290 290 300 

CTTCAAG7GA tTCTCCTCCC TCACC^ICCC AAGTAC^TTCC GACTACACCG C7CCCCCACC 

310 320 330 340 350 360 

ACCCCTCCGA TAATTTrCCC a - X ACTA CAfiAXGSCCT TTCACCAtfCT TCCNGCACOC 

370 330 390 400 410 420 

rCOTCTTCCA ACTCCrSAWA TCATCAtCTG CCTCCCT7AC CCTCCCCAAA CTGCTCCGAX 

430 44C 4S0 460 470 480 
TSCAGCSCTC \CCZXCTZT? CCTCCGCCTC 



10 20 30 40 50 60 

cctcaicaxg cttcacoccs gaccctstsc ccsaacaatg ctcccacaca ghataaacaa 

70 80 90 loo 110 120 

T5CTCCCCCA CACCA7ACAG AATCCCCCCS CACACCATAC AGAACCCCCC GCACAGCA7A 

130 140 ISO XfiC 170 iao 

CACAATCCCC CCTCACACCA TACACAACCC CCCGCACACC ATAGAGAATC CTCTTCACC7 

W 200- 2-0 220 230 240 

crcGorrrrr aaccagccaa actaaaaica cacacg3cia cacatcattt aagatagaaa 

m 250 250 270 230 290 300 

rrrcrsxArc rrrrAAtrrr tttciaacta gttttactta rrrrcACArr crArrrcrrr 

310 320 330 240 230 360 

ACTACAAXTA AGCGATAAAA TAACAAXGTG TCCAIAATSA ACCCTACCAA AOlAAGWAA 

370 330 330 400 410 420 

ccTAccmr rrrcArAcsr crrcrrccAc ArrcAArsAA cgtctottcx aaaa—taac 

430 440 4S0 460 470 480 

CCCrCACCCA AATATTCACr TAACTATGTT AAAAACCCAG ACTTSTCArr GACT7TZZCC 

500 510 520 S30 540 
TGAAAACGCT TTCATAATTA TCTST5AAT3 TGTC7C 



088 




901-?12.c«n 



10 20 30 40 SO 60 

CGA-cccrcc ccrrrrxACA ccaxacaagg xaacxtcccg acsttgccai cccaxcxgxa 
70 aa 90 ioo no lao 

AACXGXCAXG C^TTZCCZZ CCAGXCXCXT XXAGCATGCT AAXSXAXXAX AAXXAGCGTA 

130 140 ISO ISO 170 130 

XAGXGAGCAG XSAGCATAAC CAGACGTCAC TCXCCXCACC AXCXTGGXXX XGSX5CCXXT 

190 200 • 210 220 230 240 

xccccaccxt cxtxaxxsca accagtttxa xcaccaagat cxtxaxgacc xsxaxcxxgx 

250 250 2TQ 280 290 300 

ccxgacxxcc xaxcxcaxcc cs:taactaag agtaccxaac cxccxgcaaa xxgxachcca 

310 320 330 340 350 360 

GCIAGGXCXXG GZCZZZTTTa ACCCAGCCCC TAXXCAAAAX AGACTTJGTTC XXGCZfCGAAA 

370 350 390 400 410 420 

CSCCVCXSAC ACAAGCAXXT TAAAGTCTTA TXAAXXAACG XAACAXACXX CCXTCSATAX 

430 440 450 4«0 470 430 

CXCGXCXCAA AXCACAGAAA CCXCAATXTG CAAAAACCXC CTXCCASCXC CACCCAGXAA 

4SQ 5C0 510 520 530 540 

ACAAGXTTXC AXCCACCXGX CAGTATTXAA GCXACAXCXC AAAGCATAAG XACAAXTCXC 

550 550 S70 580 550 600 

XAXGTX5CCA XSAACAGACA GAATSGAGCA AffCCAACACC CAGSXAAAAG AGAGCACZXG 

610 620 630 640 650 660 

AAXGCCXXCA GTCAACAAXG AXAGAXAAXC TAGACXTXTA AACXGCAXAC XTCCTGXACA 

S70 630 630 700 710 720 

rrcTrrrrrc xtgcxxcacc u; v:a caac xcaxagxcac gggxcxgxtg xxaaxcccac 

730 740 7S0 760 770 780 

GXCXAACCCX XACCXXCAXr CXGCXGACAA TCTGAXXXAC XGAAAAXGXT XTXCTTGXGC 

750 800 810 820 830 840 

XXAXACAAXC ACAAXAGAGA ACGGCAGGAG GACAACGACA CACGGAGCCT X5GCCACCCX 

aSO 860 870 880 890 900 

CAKCCAXXAX CXAATGCACC ACCCACGGXA ACXCCC3GCA GCXGGX5CAH CAAGAXGAGG 

910 920 930 940 950 960 

AACAAGAXGA CCAWCXSACA TXCAAAXAIG HCGSCAAGCA XGXGAXCAXG CXCXXXGXCC 

970 980 990 1000 1010 1020 

CXGXCACXCX CXSCAXGCXC CXCCTCGXGG HXACCAXXAA GXCAGXCAGC XTXTAXACCC 

1030 1C40 1050 1060 1070 1080 

GGAAGGAXGG CCAGCXGXAC GTAXGACXXX XCXXXTAXXX TXCXCAAASC CACXGXCGCX 

1090 1100 1110 1120 1130 1140 

XTTCXTXACA GCAXGXCAXC AXCACCXXGA AGGCCXCX3C AXXCAAGCGG CATCACXTAG 

1150 1160 1170 1ISO 1190 1200 

CTGGACACCC CAXCCXCXCX CAXS3XCACG ACCACXXGAG AGAKCCAGGG CTTAXXACXX 

U1Q 1220 ' 1230 1240 1250 1260 



089 



cvrsrrrrAA gtccacaaaa ccaacactcc acaactatct rrccrsTAtc sta— Acroo 

1270 1230 129C I jCO 1310 1320 

ATAC00C7GA ACTTATCCrp AA2CCAACAC ATAAAT7CTT TTCCACCTCA GCCriCATTGO 

133C 1240 1350 1350 1170 1280 

ccscccatts trrcrrcTccc tacaaca— c t~ c c::r:f c rsAcrrxGSir gcattaaatt 

i3«o nco 14X0 i4ia 1430 1**0 

ccTc-rcAirc cccrcrrtrrr cc-stTArAT ataaactjitt cctcccscaa aacaactagc 

1450 14€C ' 1470 143C 1490 150C 

ACTC3AAXAT AAAATTT7CC TTTTAATTCT CAGCAAGSMA ACTTAC77C7 A2A7AGAAGC 

1510 15IC 1530 1540 1S3C 1550 

ctgcaccctt acagatgcaa oazcgcaac cscacatttc gcacaaccsa gccgaaaggc 

1570 1530 1550 1600 1510 1620 

TTCTTATCCC TCACAC^CST CCTCCCtrGCT CCrrGTGTSCT ffCCCTCACTC AiTTACCCTTA 

1630 1640 15=0 1660 1670 1530 

CACTCGACAG CC2TAAACTA A~CCAA~0 GNTAATTTAA AGAGAAr:*AT GCGCTSAArC 

1690 17C0 1710 1720 1720 1740 

crrrsccAcs actcaacoaa cac;tacctag maggtaactt caatga 






?10-V_5, 



20 



30 



40 



SO 



60 



CICG7AXAAA AGACCAACAX TCCCANC^AC AACCACAGGC AACAXCTXCX CCXACCTXCC 

70 aa 50 ico no no 

CCaNGCTGX AAXACCAAGX AXXC^CCAAX X7G7CAXAAA CTT7CAXTGC AAAGXGACCA 
130 140 - ISO 160 170 180 

cccxccxtgg txaaxacaxt gxcxgtcccx cctxxcacac xacagtagca cagttgagxg 

190 200 210 220 230 240 

XTXCCCCXGS AGACCAXAXG ACCCATAGAG CTXAAAAXAX XCAGXCTSGC XTXTXACACA 

2=0 260 270 2S0 290 300 

catgxttcxo actxtcttaa tagaaaaxca acccaact^g rrxAAAXAAr gcacaxacxt 

310 220 330 340 330 360 

tcxcxcxcax acactac— c acacgtag;k: agxccacatt agxasggtsg cttcacgttc 

370 330 390 4C0 410 410 

ATCCAAGGAC XCAAXCXCCX TCXTXCXXCX X7AGC77CXA ACCXCXAGCX XACT7CAGGG 

430 44C 450 460 470 480 

XCCAGOCX5G ACCCCXASCC TTCAXTTCrS ACAGXAGGAA GCAOCS^ ACAAAACAAC 

490 5C0 510 520 530 540 

AXAGGACAXG XCAGCACAA7 7CXCTCCXXA CAACTTCCAX ACACUCACA TCTCCTXACA 

550 560 570 530 590 600 

agxcaxxgcc cxxacxxstx ctcaxagcca tcctaaatax aagggagtca caactaaagt 

610 620 630 640 650 660 

CTaSCMTGGC? CCGAAXAXT5 GCACCTGGAA TAAAAAXGXX tTTCTGTGAA TCAGAAACAA 

670 630 690 7C0 710 720 

GCGCAAGATG GAXATG7GAC ATXAXCTTAA CACAACXCCA GTTGCAAXTA CXCXCCAGAX 

730 740 7S0 760 770 780 

CACAGGCACT AATTAXAAGC CAXATTACCX TTCTTCTCAC AACCACX75T CAGCCCSC5X 

790 800 810 820 830 840 

CCTTTCrSTG GCACAAXCTG GTXOfASAAC AACXTCCXAA TAAtfCXCXAS CCSAAAAAAT 

850 860 870 880 890 900 

TTGAXGAGGT ATTAXAAXTA TTTCAAXAXA AAGCACCCAC TAGATGGAGC CAG7C7CXCC 

910 920 920 940 950 960 

X7CACAXGTX AACTCCXTCr TTCCAXATGX TAGACAXTXX CX7TCAAGCA AX7XTAGAGX 

970 930 990 1000 1010 1020 

CTAGCTGTT7 TTCXCACOTT AAAAATXCTT ACCTACCAXX GCTGACXTCC GGAAAAGXGA 

1030 1040 1050 1060 1070 1080 

CTTAXAAGAX HCGAAXTSAA TTAAGAAAAA GAAAATXCXG TCTXCCAGGX CG7AAXCXGC 

1090 1100 mo 1120 1130 1140 

KTCCXCAXCX TCAXXAACAC TCANCTAGCG CTTTXGXGXT TGXXXXAITG XAGAAXCXAX 

1150 1150 U70 1130 1190 1200 

ACCCCAXTCA NACAAGAXAC CCAGACXCTG CCCCACAGAG CCCXCCACTC AAXXCXCAAX 

1210 122C 1230 1240 1250 1250 
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CCTCC=Ar=A 7CATCACUCT d^WTC ACCACTAHHC T=CTCCT~T TCTGTATAAA 

1270 U80 12SO uoo ,,, a 

TACVficrSCT AZAACCTCAC CATSACACAC ACATCT~C3 — CCA^T j " ° 

tccttgSS kcttc££ actaacSS a«eaS aaagaaaS wrrrwS? 

1390 1400 1410 U 20 Uin , aa» 

*■*■*•'"■"- »•*•• »>-. UVCT CnAACAS? CAAAAAATAC 
1450 14*0 1470 1430 I4qn 

AAAAAGCAAC CCACGTCCAT GTSTAA3GCS ACSCTCACAG G^AG^ GAGGA^ 

1510 " 1S20 1530 1S40 1«0 ,«„ 

TCGGCCCAGG AG~CACAAC CAGC7TCSGC WC57ACCU CACCCTSOT CTAT7AAACA 

AAACAAAaS ««SS AAGTATTT7A XAXCCA^ J _ ^ 

nurcaS ata-a^S? tatoa^aS weuoK ncTCA^S? ttatc-!?? 
ttccca^ aaxcc— taccc=aS2 tereuSJ taaaacSa aaacaaaaa^ 

GAAAAAZAAA e^^gg ^."S Ar== ^™ 

c*^r£S ^aaI^ aggoc=^ ^^-2 ^ 

ia70 1830 laso 1300 1910 .... 

CCCACSASTT TC5CTCVCT5 C\C . 19 "° 
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P A) 



caccac^ eacssuS ^oacx^ «caca^ 
— ^ cc^cSS jcra^Jg atcacttSS 

CCTCHC^ CSTSGSCAAtt A7SSC3AA&C CCTSTCtS CTAAAA^ 

130 200 210 37n 

ccakaakhwa cc=rAC«r S cxcccsc** ec^S cac^S csu^oSS 
ccAcsAra*. htcttsaacc cu^cccac a^^~° garctga^? c=t«=a^ 
cactc=a£S recced AcradJ" tct=tc=J!S aacaaaaaaa aaatctsSS 
mraJS TTSTCccAcc «r=ax5? «»ug 

secret a«a« tctattstts crscrcr^ TaaswiS -^c-^Jg 

490 500 «1Q .-> n < 1n 

taactt^ aat^cccct crsrc—a caa^^ ccr^cr^? gktascS? 
curattS c^tctac^ c^r^I? acrcrs ^ — ^ 

AGSTCA2AC? ScnaS! ACTAT^ TC^SS 

S70 «SQ fig 0 7q Q 

crccracczr a^taacta gaactcaaag saacttaasa ctacagt?^ txctaagcct 

730 740 750 7«n 

TWGWAACC ATTAXATAGC CTTCTXCTAC CAACTCTTC? CdAICA^ «?T7*t2a 

_! 9 ° ..... W0 « 0 « 23 „„ MQ 



CAAACCCTHT CAACSAAtSG TAIAAAMACC AAAAATAAt? CAX. 
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10 20 JO 40 30 to 

~ - — — =^ c acagtttctg ccrzac^a racxsrss ccaxrrrcaA 

™ SO 90 ICQ HO n,, 

caaaccatto rcAocTcrrr caxrcrccar cac=ttc= crrcirTrcr xcrrrccSc 

120 140 ISO lfi 0 170 laQ 

ACAzzarcrr ctacagcctt ttatccacca ATracairc caicutttt Ata=e=AccA 

„ "0 200 210 220 „ 0 

rxrrrscscr iac^tccca icsjBerrsc rrcrrrsacr ataacaaaa; ctkmacca 

2S0" 260 270 aao 290 , ca 

CAAACSTSAT TTCCTGTCTC CACATC7AAC AAArCAACAT CSCCSCCISS ACTT~TCCAC 

-22? 320 320 340 350 Jflo 

CCA AGTCTTCCT3 ACCACCTTCC ACTXTTSCAC TrrSCAACCA CCT=C==ATA 

37 0 380 390 400 «io 

caaaacsak ttcaacaiac ttcatcscac rccAcrsrcr cctcsctsca gaaaczacca 

430 440 430 460 470 4 ao 

SArrrcACM acsacctcaa ccacatatca tagccscsca acttsctsto ccsca-cacc 

, 490 300 310 920 530 « 40 

AGCTTSACSC CT3CTCACAC AC73ACACT5 CSAACTCrCA C3AC=U~=r 

350 560 370 «SO 590 6Ca 

TAC=AAGA« TTACS-rCAAC TCCTTTAAAC CAAACS5AAC TCTTCArCTT AAACTACACS 



TTCaAAAICA ACCCAAXAAT TCTG7ATTAA CTSAATTZTS AACZ 



5" «Q «30 640 «0 «60 

AACTTTTCAC GACSTACTCT 

«70 «a0 «90 700 710 75« 

CACGAACACC AGGCACCAC: ACCACAATCG CCAAKSACA CSTCC3CKC CCTrCSACCT 

TcccrrrSS ««sa.!J! "! . . 760 770 ™ 
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10 2C 30 40 cn 

c=a:cc==c= c=rr=c=T cccaaactcc rccsAraca cssrcum 
scrwcS csuxasS cacrr^ orct^ n^** 

Acrw^Tw ATTcccrrcr cscanrau Maaagg rarcrcw« wrcc=cc|22 

190 2C0 j»o 33fl 

C==CAITC7G ATAACSAZAA «TCACAtTA TAOUTmSc CAAAATtSJ ACC3C7AGAC 

2S0 260 270 -jfln 

AAATSATTTT A75AAAAZA? AAAGATTACJT sau^gj cCCACCaIH auUUUtfJg 

310 320 3»o , c - 

CT^C^ TTCC^T-^ K^ACtOA 

«-3 n^^iS .^"0 

430 440 45Q 4*fl A -n 

<=AAACACTC CTACCTTCAC CT7A=A0TAA ATTT3TCAGT TAGTTSAAAG TC=X«cS 

490 5C0 51fl e-n ff-irt 

T=AA=ACA7T CCTSOSnac AAATTCS7CT aiAACxfe? TSAXT^S AAAI0JRIST? 

!!£ 5 «? SS0 530 «flo 



AcrAcoArrr mcsucuc ccArsAccrs =t=aaa«« ;v ~ a j£2 taatctcSa 
c=c«S? cacucJI? Trrcc^lg rc=c=rc£§ cacaacaJS? 

<a»cx u£ nrari ji SS crauswiS rexsseES kk-rSS aacac=a£J 
— - j. ccrrrrcaaA AccrcucAc gctttcxttt 

CTATTT^? ACAGT?^ cmksE ACACTTA^? CCAC77CC7? 
850 860 370 flon ««^* 

6TCC37ACTT CCAGAAX=CA CAGATG7CTG AGGACAaS CCTCAGC^T ACT....!?! 



PHCC3LAM blaata DATALI3 nr Begin >101i.ccn 




PSCG2A2C blaata OATAUTa nr Begin >1024.c=„ 



PHCcswa blasts DA7AI.I3 nr Begin >102*t.c 3 n 
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I ftp 



blastn DATAXI3 nr Begin >10S8.ccn 
ATCTGTSCTAGGTAG7GTA CTAATCATTCACTrrrArCTCATTTAATCTMirArGir A jr«rr a i /-- ^ 

TACrcp^CTAGTACCrrcACACTACTAACT 

AfiATTAcrscc2ic«TcrcACTc<maaaa^ 

TKTAATArmTAXTACCATA^^ 



blaam DATA£I3 nr Begin >lC92.con 





100 



PHCG3AM blastn DATALX3 nr Begin MllO.caa 




^C-fU^/C (S jTf) aj'o; Zc 
92CGMM blasta 0ATALZ3 nr Be?ia > 1127. con 
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0 



PHCG2AM blastn CATALI3 nr B«gia >Ilfi3.csxx 




PRCCRAX Slasta DArALIS nr Begin >U9.ccn 



CACCCAGTAAAAC7TATCTCATGAGCATAAGCCTGAATCGCATTGACAGCCTACAGAACCCCGA™' 
TTtAXCXTGAGGGCArrAGTGGGGGTrGGGGCTrAGCTACrCAAAGTTTAAGGAGGTGAAAGG^ 
AGCAACTTGTCCCTTACAGGGTCAAGCTAGGTCAACGAAATrCCCAGGAGCGTGTGGAAGCrCTC 
TACC^GATAGGTGAGC^CAAGCTTATGACCGCCCAACCT^CTCCCCAAGCTTCCCTTCCA^C• , ^' 



TCCTC7TGATTGACTTCCACAGCAAGG7C 




PHCGRAM fcla«n QATALX3 nr Begin >i28.can 



TAAAAA CwAAA - ArrTTNTAAAAGAACTTGOCGTrJTAATATei: ViT a Ar^ 
TACAACTTATTGACCTTTTATGC-tf^ 



P3CC2AM blast:.- 0A7AJLI3 nr Begin >lso.ccn 

GAGTATCTGACACCTAAGATTG >. i meTTr-rrmt i 

TAGAAACCCCTTGTcrorrACACTGSS^ 
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GAGAGGCGCAGGAGCCACAAA7AAAGCAAGAGCCAGAA7CACAAGNGGACGAAGAAGAAAAGCAA 

CAXAAAG^GSAAifACCAGAAC;^^ 

GAAACC7ACTCTGACGCCCA7CAGC7CrGC7CCAXCrG7TTCCTCT^ 

C7AACAC7CC7CGGGA7GAG7CTCCCTG7GG7A77AXTAT7CC7CATG?AAACrCACCAGATCAA 
CACCAACCTGAGGAGCArAGGC:nC\AAATAGGACr;UG7 

7GC7CAGCC7AA77C7G7GAAGAGAAAGAAACTACCTG7AGA7AG7G7CT7TAACAAATTTGAGG 

A7GAAGACAG7GA7GACS7ACCCCGAAAAAGGAAACTGGTTCCCTTGGATTATGGTGAACATCAT 

AAAAATMCAACCAAAGGCACTG7AAACACTGAAGAAAAGCG7AAACACArTAAGAGTCTCAT7GA 

GAAAATCCCTACAGCCAAACCTGAGCTCTTCSCTTATCCCCTCGATTGGTCTATTGTC 

TACTGA7GGAACGTCGAA77AGACCA7GGA7TAATAAGAAAATCATAGAATATATAGGTGAAGAA 

GAAGCTACATTAGTTGArTTNGTTTGTTCTJUCC^ 

AGA^GATGTTGCCATGGTACTTGATGAAGAAGCAGAAGriTTTATAGTCAAAATGTGGAGATTAr 
TGA7ATATGAAACAGAAGCCAACAAAA7TGG7CrTGTCAAGTAAA^ 

CAT^CAGATTTCTICTTTGCCACCCTTrrAAGGACTTK^ 1'T KCA AGACAX 
TG7GAGA7C7G7AA 1T I V1 ^ T 1 17 77G7AGAAAATGTGAATIT^ 

GCCC7G7G7ACTCCCT7GG77G7AAAG7CA7CTGAATCCTTGGTTC7CT7TATACTCACCAGC7A 
CAAAT7AC7GG7A7G77T7A7AAGCCCCAGC7ACTG7ACACAGCCr^^ 
TGC^GATT^G^TCCrrGTAAATATTAAAACGACTCCCCAArrATTTTGCAGAATT^ 
TTGAAATG7ACTG7A7ACCAACCAACATGAACAATTTTAATT 

AC^ACCCCCACTC7C7777CA7CAGAAA7GGCAAGCCCTTG7GAAGGCATGGAG7^ 
AA^GCAAAAATTAC CAG ACAA7C CA77 CC7AC7G7AT77C7G7A7G AA7G7G7TTG7G AATG7AT 
C7crTAAAACrrC7T7C7777CCC7AAT77GC777GG7GGGG7CCT7AA^ 

AA7AGAATTG7AAAGGAAAAG7 GG7ACTG77 C CAACC7GAAA7G7CTG7TA7AA7TAGGr7AT7A 
G-^CCCAGAGCA7GG7G7TCrCG7G7CG7GAGCAA7G7GGTTTGC7AAC7GGAT C^GU A T1"I CT 
TATTAATAAGA7GGCrGerTC\GCTTC7C7TT7AAAGCAATC^ 
TAA' T '77TATT0CrCAGAAA7GAGaCArATCCC7AAAAATCCrGGA 
T7G^C^AA7TGGTCC7TAG7TTAATTCrA!rrGTA7C7G7TrA^ 

<^AAAAG7G7AAG7GAAAACCCCCrTTAAAACAAAACAAAA^ 

GACAGAC^G7GAGAG7TTTAC\AACATGA7AGGTATTC7GCTCGGCAATrrcrrAAOT 

TAX7TAAGGAXAAAGG7AAATCATTCAAGGCAGTTACCAACCACTAAC7ATTTGTT7TCAT^ 

G7C77G7AGAAGC7T7ATATCTTGTTTTACC7TGGCXCATTAG7GTTTAAAAAXGTACTGATGA^ 

G7GC77AGAGAAAT7CC7GGGGCrTTC7TCGT7GTAGATCAGAATTTC\CCAGGGAGTAAAAT7A 

CCTGAAAACG7AAGAAGTTT7AAACAGC7TITCACACAAATTAGA 

AGG7ACT7ATTrAAAAGAAAGGTAAAGATTGGCCTGTrAGAAAAAGCATAATGTGAGC^ 
TAC^GGA77r mTlT l T l ' 1 i' A AACAC\CCTGGAGAGGACATTIGAAAACAC7GTXCTTACCC7C 
GAACCC7GATG7GGT7CCAT7ATGTAAATATrrCAAAlATTAAAAATG7ATA7ATCT 
AAAAAAAAAAAAAATTCCTGCGGCCGCAAGGGAATTC 
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PSCGSttK blast* 0ATALI3 nr Begin >170.con 




AAT<*TA.VCC\CGACCACCACCCATXCAG^ 
TTAACCCTCGCOTCCTCCATTAGAXAAXGGCrCA CCT ^ T< ^ CTCCACCAC =^CCS^AG 



10?. 



ACGG7GC77 




CG7CT7GACAAG7T 



CATC— ACCACAAA 



ATCCCAGTTGGTAATAGAGACTrTACTCCTACCTATCAAA^^ 

GACATGT TGrAC ArGTrAGCArCATTCAAArAACCAAGArTArAAGCTCAGGAAAGArGCC A 

ACTCATrrcrTTTCTCrCTCATCC^GTTGGTTCC^^ 

CCTCTCCAGGTCTCTTCCAGGCCGGTCArAGACGTACTCCCTCTGAGGCCGACCGATGOT^AGAA 
GAGGTGrcrAAGAGCGTCCCCGCTCAGCAGCCCCAGGCCrC^GCTGCtCCrCTGCAGC^^ — ^ 



CGAGCCTCCTCCACCCACTGCCArCTCCCAGCCAGC^XCACCTXTCCAAGGGAATGCATTCC^CA 
CCTCTCAGCCTGTGCCAGTGGGTGTGGTGCCAGCCCTGCAACCAGCCTTTGTCCCTGCCCAG^C - 
TArCCTCTGGCCAArGGAATGCCCTArCCAGCCCCTAATGTGCCTGTGGTCGGCATCACTCCC*'^ 
CCAGATGGTGGCCAACGTWTTTGGCACTGCAGGCCACCCTCAGGCrGCCC\CCCCCA^CAGTtLlc 



CCAGCCTGGTCV3GCAGCAGACATTCC:TCC\CXACGAGCCA^^ 

TrTAAGCCTCCTGCTCAGCACCTC\ACGGTTCTCCAGCrrTCAATGGTCrACATGArGGCAGG~ 

GGCCTCAGCAGACAGGCATACAGAGGTTCC^ACAGGCACCTGCCCAGTGGATCCTTTTGAAGCC" 

AG7CGGCTGCATTAGA-V\ArAAGTCCAAGCAGCGTACTAATCCCTCCCC7ACCAACCCTTTCTCC 

AGTGACTTACAGAAGACGTI^CAAATTCAACrCTAAGCAATCATTATGGCT^ 

TACCAGAC\GGGAGCAGGGGGTAGCGGTCAAAGGAGC^AAACAGArrrrGTCTCCr 

TCTTTTCACTAATCCCAAAGGTCCCAAGGAACAAGTCCAGGCCCAGAGTAC7CTGAGGGCTGATT 
TTCLUACACATCCGAAJSJ^GCACTC^ 

CGAAATGTTGCTTTCTCTACrCCCTCrTCC:^ 

AAAGTA'TCTGAACAAGAAT CTATATT CCAAG CA CATTTACT GAAACC7AAAACACAACAGCAAG C 
AAAGCAATGTCCCTTTGCTTTTCAGGC^TTCAC:^ 

GATCAAGAAGAGTGCrTTTGTGCrCAGGCrGGGAACAGAGAGGCACGCTArGCTGCCAGAATTCCC 

AGGAGCCCATATCAGCAACTGCCC\GCAGAGCrA^ATTTTGGGGGAGAAGTrGAGCTTCCATr^ 

CAGTAACAGAATAAATACTACATATATCAAAAGC CAAAAT CTTTATTTTTATGCATTTAGAATAT 

TTTAAATAGTTCTCAGArArrAACAAGTrCTATCAGTTGTA^ 

GCTAGTTGTAACAAATTGTACATRAGATTCATTTATCAXTCA^^ 

AAGGCTGGAAGCATGCAGACAGGATCCCTAGCITCTTTTCTGXCAGTCATTCATT 

CATTGCAJiCAJ>CAJ^TCATGCr7ATGAC 

GAAAAGCAGTATTGTCCTGGTTTTAAACCTATGATGGAAXTCTAATGTCATTATrTTAA- 
CAATCGAAAT ATG CTCTATAG AGAATATATCTTTrATATATTGCTGCAGTTTCCTT ATGTTAAT Z 
CTTTAACACTAAGGTAACATGACATAATCATACCATAGAAGGGAACACAGGTTACCATAXTGG^ 
TGTAATATGGGTCTTGGTGGGTTTTGrrTTATCCT^ 
ATGGGGATTCTGGTTTTATTAGCrTTGTGTGTGTC^ 

C^CCCTTGAC\GTTGCAGCCTCTrGACCTCGGArAACAATAAGAGAGCTCATCT 

TTT7GAACGTTGGCGCTTAC\ATCAAATC7AAGTTATArATATTTG7ACT^ 

TCTGCTTTAACAAAAATAAATGTTCATGGTAG 





. 1 1 




1 1 1 



PP.CCSAK blastr. CATA1I3 nr Becin >2S3-32S.ccn 




3 



1? 



3 * 




in 



ft 



PSCGSAH blistn 0A7ALI3 nr Begin >293.cca 

GTGCArGTAATTACACx 
ACTTAATCTAGTTTC 
AAGCATAAATAGACA 




2^ 



GG AAATAAATGAGA7 CTCAG7GGTGCTA "*< — it" -, , „ 

GG AC CSCAAAA7C AAAGCCAC ATC C^CT^G ~ - ~-*^ir ^r-— -^^^ C"* t *rG7JT7GGAAAAA 
CAGCTTGGTCAGAAifTCCTlr^ GAC ^CAGTC5GNTCTAACTGACC^TCTCaXSc 



PRCG2AM blasts 0A7ALI3 nr Begin >295-43.crn 



AAATCCAGTGCAGCCAACArTATSTCGAAATAGAAACaGGGCTCCTCCTAGGAGA^ 
GGCrrTCCTTrSSAACCCCrCACTCACTCATCCCCCCTGAAflCAGG 

TCCCCTGCTCCTJrrCCCTNCCCCAGGCCGAGATAGGAAACCGGAAHCCTGGGCAGGCTGAJICCCA 

MCCGACTGGAACCAGGGIIAGANCCTGTGGGTGGGTGGNAGGGAGGGAAGGAGGCCAGATTCCrCC 

AGAACTGGGGxLfcGAGAACAGGTTTTGGAAGTTCCGCCAGOGTTTGGGTTTCACX^ 

ATCAifACCCTGGAGGGTI^CXCACTCCrCGTSCAtfTra 

CTTCCTTTCAACCCrCCTCNTAAAAAGTTrrGATNTrTTAAGG 



D 



ACCAAGAGCCCCCAG7TTATG2(7AAC7CTCATGACAAACACAArr^^ 
CTATCCAGGAACCACGANTCACCTATTACTACKTTCCAGCAGAAr^^ 

ArCCAGGG7AAATCCCrGACCATGTGAGAGGAATCCTAGrGCCCCAACAACCrCACCCCCTGAC7 
CCTCCTCAANGGCTCTGCC^GTCAACAAAAAAATCC^ 

AG AC CAGCG7 CAA C CTAAATGT C CATC\ATAAGCC AATGGTTGGATAAGTAAAAATT AT GCAGCT 
GTAGGAAGGAAT G AAG AATGTCTAT 



L> ftp 



?RCC«A« blasts DAT ALT 3 nr Be^in >310.csn 

AAAGGCAACAAAAGCTGGTACCCGCCCCCCCCTCGAGCTCaACGGTArCGATAAGCTGOArA^CS 

AATCCTCGAGATCTACCiAXVAAAAAAAAATTAACTTCCCAAATGTGGGAGTCTACTCrGTTCCC 
TCCTNGT^riTTAT^rCTCT^ACTTTYCT^^ 

NACTNXGSKGTGAACArTr^YCTATTATAAATYCTWAGAAAATATTIfCTArGGirrATGAGATAT 
TXG ATT CCAAGTG CCTXGT AATTTACTY CTCAAAT GTC C CTG ATGTKGC AW ATTXGTTMCTAG~G 
TTYCACTATTTAAAAAAACACNAATATCTGTCTN^^^ 

TT?fCCTTAGGaTAAAATCCTAGAAGTAGAATTTrTGGGGCAAATTATCTACATATTTATAAT^^ 
CTTGGTATTCCAAATCTCGTT^TCCAAAAGCTTATATCAATTTGTACTTAACACCAG' 



P5CG5A« blascn DATAI^3 nr Begin >222.czn 



AACTCCACACAAMHAAAATAAW^ 
AraCACTCCfcKKUAACTWXAA^ 

ccacActTccrcccm^c^^ 



PRCG3AH hlasto DATAI^a rtr B«;in >323-1127. ccn 

CSAGATCIGCCCCACCCCACATCTCCTTTGTTGAATGAGTAGAGAAGACTGAGAAGTATCACT^ 

TG^^Cww- *^»Gv.CCwi.CTTTCCTGTGTCCAGGAATAACCTTTG2ITACtACCCAGTCCTCT^a 
AAGATTTGTAGAAGC??AAAG7CGAAACGGACTrCSAACCTCATAGA^ 

AGCATCAA&SAArrAGAAGTCCTGACAGATGAAGAATGTTGTCITCCCAACTCAAAC-^^^ 
G^CCTACrrAAGCCAACAGAGGGGCTCCTCATr-rrCACTT^ 

CITTACACAGGTCrrCAGATCAACCSTCACAANCCAGANTTNCATGTTGGCCTCAGGAGGG^cr 
AGGTCCAACArCXCGAC^TAAGCAGCGTTCCCAGTTCTTTCATCCTCAGATAACACTNC~AAC~>i 

ATlrCTGrTAACCCTCrKQCCAGANrrAGAKCrGACTGATITCACrrcCTAG 



PF.CG3AK biaatui 0ATALI3 nr Begin >424-216.ccn 




AAAAACACCCXiAHC x. CAG kTCTNTCTCCCTAATCCTCTAGSAiGf AAATCNNAKRJTTVfNACr^ 
GN^CTGTGC^TNA^IAi/ATlfT^rCANTTGrArrTArGNACTCC^CATXGAGTA^SSSll 



WT>rnrCTIICTGCSriAACMCCCSCMCCAHTTTTrNirrTGNTCAKANACARCAATGCTGCCATAOro 
TG 




<(ffUenccr 3*) //c; -fu 



FRCGSAf* blasts DATA£I3 nr ecgin >424-227.can 



CCAGACrTrCATAACT^rCTCTTAXTATGAAGATTAGACTrrCTGA^GCrTACTCCATTAGAAGAGN 

ACGAGGGCGTAGCTGCCCCAATArATTCTAATTTCTCTKGACGACCACCAAATNGGStAGAGTG^C 

TCTGATAGGGAAAAGGAAGAGTTGGAACGJlAXCTTAGCCTCrAGGA^AAAAGA^CCATTTTTAn 

GGCCXCCAAAGTTACAXCTAGTXGCCTACAAATTTATKTCCAAACTCCTTATCCTGCC^^CAG 

GGTCCTGtfAAACTGATGCCAAACrATAGTTTAGTCTNCr^^ 

AATTATCTCGG:*AAACAGACCTGArCCAAACACAGTTXGGT* 

AGC CTGT¥ C CGTCTACTNGGGGTGTCrTXGATTTG CTCCAG 



PROGRAM biascn 0ATALZ3 nr Begin >4S0.can 



rrrTTTTCCACCAGACTTACCAAATrrTACAXG.VATCC^ 

TCTAraCATNGACCCCCACCArTATSATAGAGATCATIfTCGTGAirrAArGAAAGATGAAA^-^ 
AGCTGGGAAAGTAANAAGGAArAGGATGTAAGTATGAGCrCCTGrrTTTTACTATVT^ 

GCCCCCTCAGAAAAATATGMAA^CGGGTAACTGACTIfGGAAATGGGTHTTTTATGMAT^ 
CCCACrCACGAGGTTr ****-*i.w*iAiAGi.AAGT 



PRCC3AM blastn CATALX3 nr Seci.n >4S2.C3R 

T C C AG AT CTAAAGCAC A.TC ^ AG A CTTT!f CA C?r AAATAAA TTTA CTG CTTTTTTY CTGTGAN ATI i 
GTTNCGACA^GCiUAGCrrTXGATTKCr^^ 



F?.CC?J« blasts DA7ALI3 nr Begin >435.csn 
CTTA(r rrrTACAcr ACT7TCArrA7AcrAccA<rrrrcrAATATCTTg^-^ 

CATATATTTSTTTTAATATATCTCCTCSTTra^ 
AAGAACATTATTATrcrrTTAATAACTrcciT^ 
«XCTTCSCTATTTTA3AAAAACA«aauUM!CIC^ 

TATCTT7AACTTTTGGCCT:tGCATC^^ 
T?77CC^GTT:rOGT7CC^7GTTTC^ 

AGAGGGGGAAACCAACTTTCCACTGTTGGAGAGCACTGNATAGTTTATGJIATTGT 




355: 3£GC2TfCZ SC. 4o 

c^AAcrr^AT^ic- i - — * * « 

ACCACCCATO 
GTS 





3570 S3CU23C3 ID NO. 47 
<^^^_C7ACAATAAAAC^^ 

^ 7 ^^E?~ C ^ : ^ CTGGCr: ^» — ^ - * ^ AAAXAArtATAACTCOJCAT 

3571 SZCtJ^rcS 12 NO. 43 

J Wau ^^ -i^CTG au * . * LAGAA7?TGAACT?G7GAAGT^CT3AAAAAC7ACVTGG^ST 

jc^AAAocuAcrrrrA£^rr^ 

"^AAAGCTSrA*^ 

ccca£Ow * c^^xcrrrAiackrrGTCTu-icg ^i^ j; - iiA Cur^CAcrAGCACACG-rg 

AG7C7:rrGCACTGCAGAC^TATGCCCCATAGACCT77AA 




rrr?c_ _ 

GAAGG7GAAACAAAGC 



G7G7GTA7TAAAAAA7~A 




~A^CA7AAAC77TGGG777CA?GA7AC: 



_ _ ' w — • - . « W\« x 

SACCC7GGAC7GCAGTAGTGTGA7 



7CAAGrrA777TC 



'CAG2TC7CC7GA 



».3 
O 

Si 

o 

111 
C3 
O 



35.73 SSCUZNCT ID 3C, 50 



CAAAAAA7CAAkGCGAAGT7 

gtgtictgc-zt 



ccrcACActAcrrc 



rrcTccA~ccrcAcrcrcc7r-CT3 




, ^^AAOTTCAGAArrCAGTTAATACAGAArrA 

rc^CGT^TACTTTAAGATGAAGACTTCCS a .^.x. 



:rcrc 

7G7TCAAAA7 — rJ . . J L IT 
SGAAGGAT77GGAAGGAAC 



- - GG7AACGG7AC7CCTGAGAGTTCGCAC7^C^nGAAAA7CC7 CCTG7 

. CA^CTGCT^rSSGGGAC^GAAAC 

rCAAACCTSGTACr: 



A2GU7AGGAAA7CACTTGGC: 



C7CC77GA 

'GAGGGACACAG7CCACTGCGA7GAAG7A 
'CCAAAGTCCAA7AGTG2IAAGG7GG7CAA 
TAAAAG7ZAGN*CGGGAArC7TGAT77GG^7AGN7T:CG 



3 = 75 SSSw^ia ID SO. 5! 



GGAAAGAGG7r7rZ7AACVC7CAGACAG7G7AAAAA7CCAG A , 

GAGACAGAG7CTCGCACTCT7 ACCTCACGCTGGAGTCCAGTGGCAC 



3535 SZQ^TCS ID NC. 52 



AGTCCCAGC7ACTCAGGAGGC75GGGCAGGAACA 
GrGTOAGCTATGArCACAC 



^TTGaGCCTGGGACTTAGAGGCT 
TGGCrCAACACAGCAAGACCCrAAAAC 



C7GCAC7 

7AAA AAACAA AAGAAAAAAAAAA7A7A7 
A^CApTTrcCAGAOG7Arr:7GAAACCCAAAGrr^ 
Tw- i i. ^£^^G^AAGCGGGAGAAGAA7C\7CATACACACACACACAC7TATACATACAC 
A7ATA7ACAAAAIACA777?TTAA7ACACACATA7AAACA75GAG7ATAGGCA7AAC\CA 
C7GrrCCT7GA7AAAA7A7AGGGATCC 



3535 SI 



ID 270. 53 



TAXAT7TJ7A7CAAGCAACAG7G7G77A7GC CTATAC7CCATG777ATATGTGT37 A77AA 
AAAAT C7A? T 77G7ATATA7GTG7ACG7A7AAG 7G7 U Z' G TG 7G7G7AT3ATGA77 C7CC7 
CC CC? N 7TGAAGGTGAAAGAAAGCACAC CTTTATTTAAGCATAAACTTTGGGTTTC ? AGA 
TAi- w 4 ^rGGAAAAA7GA7T 7A7CTCCCAC7TTGAAArrCCAAAATACGTACA7A7^ 

1*1 * :AG777?IAGGGTCr7GC7CrrGr7GCCZAGGC7S<3ACr7GCAG7 

AG7 GTGA 7CA7AGaTCXCACAGGC7CTAACTCCCAGglTCAAGC7ArCTTCC7GCCCCAG 
KCTCC7GAG7AGG7GGGAC7 




3S2: szcczrrcr zz N"c. s« 



CTTTGGAACAAAAAAACAAAAAGA^^^ 

AAGGGCmACATATTAAT CT C7GAAAATGG AT - - G7GCG< ."GATACAAGCCAACA 

GC CACAGAG7ACG7AC G7GAAAGCTGCC7GGG7 TTAA7GGC7GG? AG7AXGTTC7AACT? 
GTTCACG 7 ACCGArG7CAC?AC7GG7GG77ACAGAA7G7GAATC7C\CAC7^ AAAZ 
C7G7T7TAT77TTAAAA ; GAA7AATTC7A 7 TACATTAC CTTArAAAAAGTAGCTAAC ~A 
ATTTTC^rTTTTAAAAGTGAATTGAGGGCAGATGCAACTGG ? 7CACACGTAT7AATG G GA 
AATACCTTGCAGAGGGCAACGTAC^GGATTGGTTGGAGCCGAGGAC^ 
C7AGGGAATAT7G;tAACAA2<G7CG7C7CTACAA 7 AAA 7 AA7 



3533 12 HO. S3 

CTGC\7GAACCTT77?T77CT7T77GCTGC^ 
C«373aG75<21ACACTClTAGC7C^^ 
TCCCACrrCAGCCTCTCAAtrrACCrACAACTAC^ 
77G777A77:iG:;GGGAGACAGAACG77Cr:GG7A7A7r^ 

77GGGZJ77 CAAGCAATCrTCCTACCTTGGCCTCT? CAACG7A? T7GGGATT7 A7AGG7G7 



GAGCGACTTGCAt— GCCCTCSA— CACrrr: 

7A7AACG7Art7G7AT7AGAA77Ar7 C77?£72TAA?fcAA7AAAACGGAT77GGGAAAGIIG7 GA 
GA 77GAC^77C7GTAACCACCAG7GG7GAAA7GGG7CGGCGAACAAGG7AGAACA7AC7G 

;?G77CAAG7CG377 G 



3! 33 S£^GZ:iC3 ID SC. SS 



GGArtnrrGTTTCrrAAAAC^UAAAAATT^^ - A 
TTGTArTAAArGGArCnrr^rr^TCTTCA^ 

7 G7GG7CAAATAGC7AGTAAG7GATGAG7A(Kj CTGGGC3CAG7GG2r7CAAGC77G7AA77 
CGAGCAC7GTG^C^:C^C-GC^CAGAT^ 
T3GCCAACVrGGHAAAAGC7CG7rrC7AC7AAA^ 
(7TGC3CAC— GTAGTCCGAC<7ACTCGC^GC^77GAGGO 

rTGAGAXCACGCCACT? GCACTGGAGGG7GGGNAA 




CGG7AGACXTCT3 



3550 



ID NO. 57 



C7CCAG3CTCAAACCCTTG7C77GGCA7CAAACAATCGTCC 



77GA CCGT7CAAA G7A 

^7ACAACTACAGGCA7GCACTAC CA T GGCTAA777TT7AAAAAAAA*7T7TITTTCAGA 
GATGAGATCTCACTGTGTTTCCCAGGirrTGTCCGGAACTCCTGGACTCAAGCSATCC'rCG 
CACC77^CGCrGCCAAAGTG77GGGAT7ACAGGCATCAGCGAC7A7GCC7GCCCA7ACAC 
riV: -ill-.Ul. JA AMCAaGACGCAGTC7yG77CTGTCGCG^GACT5CAG7CCAGCC 
SC^BUTCrrrGGCTCACTrCAAAGC^CGCGTGCC^ 

CAGCCrCCCAAGTriGGTGGGAC-ACAGG^ATCTGCACCACGNCCGGTTA* 1 T-=J I - G^. -T 
AT 



1 2R 




3=2: 3zz~;cz zd sc. sa 

CAATTC CAGACGAGC "COCAA CACAGTGAGACTCTAr -^CTACAAAAAAAI^TTAAAA 

7TA(^AAAc~r-A~c^^rsc:rrcc;^ 

GATAGCTTGAG C C7GG GAGTTAGAGGCTGTGTGAGCTATG ATCACACTACTGCACTCCAG 
CCTOCS<p*U^CAGC^ 

5*^"<^^^*^AATT7 GX*AG7GGG AGA7AAATCATTTT? C CAGACACT^TT CT7CAAAC Z G 
AAAGT7TA~Cr:AAArAAAGGr3T jCrrTC I\ Z CACrrTCAAAlIGCnGCAGAACCATCA 
TCATtZ C ACACACACA CACT 2TArCAT?rCACA ri " l ' J ZA CAAATN CAArT?nnCtAA7ACAACA 



3552 ID SC. 55 

GGATC77G777 CT7AAAAC£GAAAAAAATT7ACTGA7AG ? ACA7TG7TC7AAG7 G7A77A 

TT"ATTAAATGGATCAr77AA777AA7C77CA7AACT^ 

7G7GG7CAAA7AGC7AG7AAC7a\75AG7AGGC7GGGCGCAjG7C<:C7:^ 

CCAGCAC7CTGGGAo^C^CGCAKClGArcXCTT3^^ 

7GGCCAACA7GG2TAAAACC7CG7C7 C7AC7AAAAA7ACAAAAA 

? 7GCGC\C7TG7AG7CCCAGCrACrCGGAAGGC7?GAGGCAGGAGCAA7CGCrrrGA 

?GC^GGGACAGG7TCCrTCTGA7GCTGAGATC\^^ 

CAAA?G27GAGA7GTT?rTC7 CAAAAAAAAATAAAA7 7AAAAMC7GA7GAGTAGGA7T7CGA 
CCGCAGACAXCr^rr r rC C AGGACCTGGyATTC 



3534 SZG73NC2 13 SO. SO 

GAATTCCTGwC7CAAG7uA7CC7C7CACC7CAG Z GTCCCAAATTGCTGGGA77ACAG7G 
tCAGCCACTGTGCrrAGCrr^CATATArcrA^': 1 : . rAATGAC7GC7AAATC7CATTGTAT 
GAAAA77*TA7G7CC7AGC7A7AAAATT7GI~AGCACA*\j j. ^ 1AA- * . . - .cl-rtATTTCAG 

ArGTrr^^cT^TArrxcruAGTArAGtAr^ 

sett ccrrrrGGTACTCA ; , :A TAGTTATGGc r r GTC er^ c r ^xr: crcATTTArArG 

AArGAr^OGAGCrrCrrArrUGAAAAAGTTCAG^ 
CAALV ^UAGGGAAAAAAGTGAATTAITGG 



3S3S SZQVzllCZ ID 310. 61 
TCAACnACrTCCCrGAATGGACTGCGTGGCTCATCTr^CraTGA 

AAACCCAAGAC7GA7AA : - : iu : 1 Itj ^ CACAGGAA7GCCCCAC7GCAGTG '" t « /-.t TCCT 

CA7G7t7T7TATCr7SATT7AGAGAAAA7GGTAACG7G7ACA7CCCA7A^ 

AT CA7T AA7TAG CTATAGTAAL J. . CATT7GAAGA77TCGCCTGGGCA7CGTAGC7 CA 

TG C— . w rA ATCrTAGCACTrTGGCAGGCTGAgGCGGGCAGATCACCTAAGCCCAGAGTTC 

AAGACCAGCCTGGCCAACATGGCAAAACCTCG7A7CTACAGAAAATACAAAAATT7G7CG 

G07ATG 



1 ?o 



szz^2z:ci ;:jic. 62 



:C7AG7GAA7 
GTAAT77AGT 



*CAGCVAC777A7AGCT777AAA; 



"3ACA 7A7j C 77GC7 3GAG7 77AAG7GCA 
7CCAGA7 CAGCAG7GNAACA 



rGAAAAAAAATTACACA 



CCAAACATTAGArArrAATST: 

C7A- . >ATT7AAr^rCTTCrr!^TAAATrAATT7ACr . I . ,l .u. . . 

?Grrr3a^ccAGT3Tr7CArrr:GcrrTG~ 
^<^^^^c^^^K7:rr?5GAGA7CAGG7^^ 

CUAC 



i 
1 

q 

a 



3*10 SECTOTCS 13 SC. S3 
7AA ACU C\GCCK^7GAGGGCACTAA7CA7AA7a^ 

GACcrrrccAGT 3AA7GGAAA7CA77CC caccacaccaaaatt ctag A7 gaggag7gaaa 

CACTA^7AC7C;^CAGCAACG77A7AGG77T7AA^ 

OGA7777AAAAGA7C7ACAATAA777CCACCAAAAC\77ATT7AGAA7AA7G7GA7jGC 
TCCCAAAC\7TAGArATZAA>r:rrCCCA Lw 1 i ' JA TAA777TACCATAAC77ATA7CAACrG 

TGCTA~ArrrArrTAA^cr:crcTcrAAArrAArr:Acrc: 

7G7G777GGAGCCAG777C7CA'! 



acggctc^actgnagtctt: 



'77T577777G77T7 
TTGGTTGCrCAGGCTtGGAGTAAAGTGGGTGCAArc 

~~~~ :rc 



rcAKGT-GGTcmrt 



3512 SECCS3C3 ID NO. £4 



G7T7A7CAAG1 



rCCCTGAATGGACTGkTGTGGCT 



UVACCCAAGAC7GA7AA7773T: 



777GGC7GTCA777CAG7A7A 

MGAATsccccAcrGGacrarrrrcr 



TrccrcATcrcrrrArcrrarrrxGaGAAAArsG^ 

AG7AAA7CA77AA77AGC7A7Au7AAC7T 
GC 



"3GGCA7GG7A 



GTAA7777AGCACrT7GGGAGCC77AGGCGGGCAGA7CACC7AAGCCCAG 
7SCGCAACA7GGCAAAACC7CG7ATC7ACAGAAAAZACAAAAA77 



AGCC7GG7AT 



3633 SSGCZ2TC3 ID SO. fa 
CTCASGS75TTGGC3GGGAG7G7CrT77AGCATOT 

GACC\G 7GAG gA7AACCAGAGG7CAC7C7CC7CACCA7 Ui i rLiU A I IVa STCG G ' m ' T SCC 
CAGC: V L i :TrATT0CAACCAG7TrrATCAGCAAGAXC77TATGAGC7G7Ar L : : L> l 'GCTG 
AC7TC^7CTC\TCCCC:ZAAC7AAGAG7ACC?AA^ 
G7C77GGCC77arr^CCCAGCCCC7ArrCAAAA7AGAG7^7^ 
CnSACACAACCATTT 



130 



3 54C sicrarcz ZZ .VC. «i 

C^GGA^4CGGGGGGATGGTGG?AAAAGTCAGAAGGCGAGCC*A£TA^AGGAGGGAAAAG^ 
AA AGGCA AGTAAAOCGTGCIGACAAAAAAASCGATAAAG.^^ 

CA » . « > 7AA^lGT5GOCCACGTGAArG*», . CC ^ACAA a %jGv» ^k* A* A . ^A*J 1'TGv- - 4 sj^«GG 
CC- - - - GAG I - GGlJAACAACw T'J<^CArCATCACACATAGC"GTCATu « .AATGC7C 
TCCATAGATTACTAArAGATTAtAOGArSCK^T 

TTG7NCAACV7GCAAGGT7ACCC7 C7TTTTT2?CC77AC2ICC CACAAAGCATTGGA^AAGC 

tttgtga . : . ^I ' A CTAGcerccACTrsATCAAArTTAAGCA : : : : » . : l z ^.n : :aac 

A27CCAGGLACAGC;r7T:UACIAAGGAAAr 



3**1 SIQUSfCS ID TO- 67 
CrGC\GCTCGAAGC\CCrrrrTGXWT7GAGC77rCT^ 

GCAACTAT CTTAGCTTAATTAArAAuA * ^ AAArcrTTGTG7GAGXGGCGr7T3GACG 

ACAGCAACTG7A7C77GAATaGGGGC7GGG7AAAATAAGGCCAAGAC~^ 
777GCAC^GG7TAGG7AC77r:AG77A«GGA7GAGArAGGAA^ 
GCrCArAAACGArCTTGCTOArAAAACTGGTTGCAATAAAG^^ 
AAAACCAAGArC^3AGGAGAGTGACC7CTGGrrArCCrCAC^ 

7A77A7ACAT7AGCA7GC7AAAAGACAC7C C77C CAACAAC GA7GA2IAGG777ACAAG77 
MCCArGGItAACo^fCCCGGA.VGNTA:{CrTG 



3S42 ID MO. S3 

CTGyAGCCTCGACCACGCAGG~GAGG7GATTCrGGTGCC GTAGZT C73\TGAGTAG^rTGG 

GAT7ACAGCCA7G7GCCACGA7GCCGGAC7AA'., . . . . ATAri" AC7A<SAGACGGCC77 

TCACCArCTTGGGCAGGCrGGTCrCAAACrcrrGAC^^ 

C7CCCAAAG7GC7GGGATr7CAGGCGCC7GG>.w. ai * aL-^GAv.'^TA. ^.AAACAAGG 
GGTGGAT^rrCATGAGTrrrcrGGGAAACAG^ 

TG7 GATGGKG77GGGGGGGAG7G7C77? 



3543 S£CU*NC2 ID UC. Sa 
CTG CAGAAG7ATGTTTCC7G7A7GG7AT7ACTGCLXTA£X^C7 

ACACA7AAA77C777TCCAC77CAGGG2*CA , rTGGGwCCCAT. • L I rCTGCCTAGAAT 

ArrCT7TCC:TTrCTAACrr:5G7GC^TTAAATTCCrGTCArCCGCCr C^ 

ArArA7AAAG7^TGG7GCC3C^AAGAAG7AGCAC7C3AA7ArAAAATrTrCCT7Tr^ 

T7Crr»GCAAGG!lAAfiTTACrrCTArAXAGA^^ 

CAAGCGCACATTrGGGACAAGGGAGGGGAAACG G TT 1 Z rA TCCCTGACACXCGTGGTCCG 

SGCTGHTGTCTSCrSCCTCCACTGA^AGOT 

ATrGGOTAATTTAAAGAGAA^IArGGGGTGAATGCrrTCGGAGGAGT 

CTAmaGGTAACTTGAATGA 



cT^cACACTx^TTrc\ACTacAo r . : A AGA7AA7G7CACA7A77CA7CTrc::c7T7 

G7TTC7CA77 7ACASAAAAACA7T7T7A77CZAGG7SC CAA7A77CCCACCCAAAAAGAC 

7AGCGAGA7TTG7TG7 j TATGCIAr.CTT 7rAAGGAGAGAA77773C7t^CA7G77 77A7S7 

Ts 7777CC7ACT7C77C7TAC7G7CAGAAA7^AAGGC7AGGGC7CCACC77GGAC 7 

C7:^AC7AAGC7AGAGG77ACAACC7AAAGAAGAAAGAACaACU^ 

ACG7CAAGC7AC 7 77AC7AA77T7GAC7GT5C7AC 77777 ? AC7AC777A7GAGAGAGAAA 

CTA7G7GCA77A77T 



3 $53 ID SC. 73 

c^7uCTcrrrGTCCcrGTOAcrcrcrGcxrGcrrGor^GTCT^^ 

CAG-: . i * A*AC777CAAGGA77GGCAGC7G7AC77AXGAGT77G O I , ' 1 7A 77A77777A 

AAwC7AG7S75C« v. - * .aCAGCA7S7C\77A77ACC?7GAACCC7777GCA77GAA 

GGwGCATGAjCTTAGCTGGAGAGCCCATC C7CTGTGA7GGTCAGGAG 
AGGGG7TA77ACTTCA7 G ; 1 7. ' A AG77GACAAAAGGAACAC7GCAGAAG7A7GT77C 77g 
TA7G<7rA77AC7GGA7AGGGC77AAG77A7GC7GAA77GAACACA7AAA7TC77^^ 7AC 
CTCAGCGGCATTt^GC^CCCAT7G^rr7CrCCr^ 
GC3JGGA77AAA77777G7 



TccvrcrcrACGAcrcrcArGGc^rrccAAAGAAGAG :r: ;AArrG AGrrr TAaAA73TG:: 

AGrrGTGAAGTGTCTGAAAAACrACArGGTGinrTtlAAA^ 
GAGAGCATCrAAGACV^CUrCT^a^AGGCX^ 

TAT CCArATAGGTTAGGGTTAGCrrrrgGCAA Ci; .VriA TAGAACAAACATTCGtyAAGCTA 

CAGACAC\KCCACSrrC7G7C7rC7ACCr^CCACVAAGG^ 

CAAA75777GAA7AAAC7 



3570 SICVTrCS ID OT. 72 
GT73CAAAG7CA7GGA7TCC7T7ACG7AGC7ACVr^ 

AAT7oACACTGTIACAG7C7AA7777A7A77ACA7GTAACi * ■ IA777CGATA7A7CAG7 

aata u ' ^ CjCT i7"r c i * : i r: . :a .s. - . „ i i ..mi : : u ksggga^iagagtctcgct 

C7GTCGCCAGG7TGCAG7GC\A7GG77CGA7CTT0GC7CAC?GAAAGC7CCACOrC 77GG 

G7TC\AG7GArrC7CCrGCCTCAGCCrCCCAAG7ACrrGG^ 

ACGCC^GGGA-AATTTTGGGHTTTrr^ 

T3CnCTIGGAACICCr3AKA7CASGATC7K 

CTCAGGGG7GAGCCACT7G7TCC7GGGCC7C 



3£"! SZCVriG 13 N'C. 73 



rrccACTr: 



-w>*ww>r»7CAAAA-CAGACA 



, ~CCCACCCAr CACA7rAT TCTAAATAArG^ . - Ibw wGAAATTATT 

A - AAAATCTSTGTAAj. V:CAGGCAAGr57TTAAAACCTA3aAC 



TGGACTACArTACTG— 'GCAC7; 



TTCACTGGAAAGwTCCACTTCSAC 
TOwCrrTGGTGrTTATCAAGTACC? 

mcAGTAr 



TGATCT3GAA J 7 . ^ « ' GGTGOGAAT 

rATT ATCATTAGTGC 



rTGCCTGTGA 



3 £72 S2SCSSC2 ID SC. 74 

, , ( , t rCCTGGGATCAATCGATCtTTCCCACCTCAGCCTCCTAAGTAGCTGG 

^Cr^^G^wTGCACCACCArGCCr-GC7AAT7TTrG7 A: i i - X T ^ T A GATACGAGGTT 
4 . GCC\rGTTGCCCAGGCrGGTCrrGAACTC7GGGCTTAGGTGATCTGCCCGCC7C^ 
~Ca\AAG7GCTAAGA~ACAGGeATGAGC^ 
AAACTrTACTATAGCrAArrAATGATTTACrCUUCyuT^ 

TTCTCTAAArCAACATAAA^AG ATGAGGAAAGAAAAC\CTCCAGTGGGGC\TT CCTGTG^ 
CAAACAAATTATCAG-rcrrC-GGTTrrAC? \?XZ ACTGAAATCACAGCCAAGArOAGCCAC 
GCACrCCArrCAGGGAGGTACrrGATAAA 



,3 571 S^CCSTG 13 NO. 7S 



rTCTTGCCGTTCCG^ACGCGAGCGTGGTGCCCCTTCCCCATTATGATCCTr^TTCGCTTCC 
GGCGGCArC3GGArGC3^CGCGTrGCAGGC3AT2ICTGTCGCAGZTCAGGTAGACGACGAC3 
ATCAC<^CACCrrCAAGC^ICGC?3GCGGCrcrrACC^ 
CGCTGArCSTCACGGCGATTTArCCCGCCTCGC-CGA^ 

TTCrrAGGCGCCGGGCXATACCTTGTCrGCCTCCCGGGCTrrGCGTCGCGGTGCA 
CGGNCCACG7CGACC7SAA7GGAA^ JCCGGCGGCACCTCGCTAACGGATTCAGCACrC 3AA 
GAATTGOAGCCX^TGAATTCrrGCGGAflAACTGTGAArGOC^ 

A^rATCCATCGCGTCCGCCA7C7CCANCAGCGGCACGCGGCGCATCTC5GGCAGCGTr^ 
GGTC37GCAG 



3674 S2QCE5C3 12 NO. 76 

CrGCAGTGrrTAAAAAATAAAAIAAACTAAAi^rrr^ 
TTCTAAACACArTTACUGCCATATAATJUS^ 

CTAGAAAGTCn? CAC CCGGCCAAGATAACACATCT77AGG7AAAAATACCAAGAAATA37 
TCCCAG»CT77ArGCG<^rrGAGGCA«CAGATCAGTTG^ 

CCTGGGCAAT GTSG CAAAACCTQTCTCCAcrAAAAATACAAAAATTAGCCAGCCATGCT 
GTOCACACAKrrTAATrCCCAGCTACriGGG^^ 
CTAGGAGGGAAGAAGTrG2?AGCCANCTrAArGTCACTG CACTCZ\G3rrr^ 



ft 



77 



rSTCA- . CAC^A CTA^n r c 



CACTCAAT7 




ACAT7CXhAAAATACAAAAAGCAACCCAC0^3CATj~;7 



^7CAAAAAArTAG7S7AAA 



373« SSSCSSa ID HO. 73 

CA - — - 7777MGGv» -^C\GjrrrAT7ACGAAC77GG7AX3Gi»ACCJ 

GAAACCACajCGGCTG ^ 



3 73 7 SZCCSrcZ 3 NC. 79 



3739 S2QSSTC3 10 NO. 80 



2«« sz;~:rc: NC . 5: 




ACAT^TAGAGAArr^G^^ 

AAC~lACCT7AAAAC^ACKArjAAC^"c-^c— ^T^A^S^^ 7 ^^ 
GAArGGAAACAC\GACjGCl\A^^G^~CA^ "^^^ * * ^TUGCGAAG 

374S azccsrcs a .to. 93 

A^Gs. . . «Crr=T5AGTCCCAC^TrACCCAAGGCACC^-r :: ^i«--— - 

3743 SZQZZXCZ 13 NO. 94 

?2S^^^=^*H^pfiCK== ATrCAAGArAGAGTr=crc?T^TC^ 
ACAAGTTTrSrGCAGGTGT^ * **^^^^***^*^3CITGGAGC* GCAGCCAG7AA 



.: . 

5 - ; 



3~5C 



zz :ic. 35 



^^«T3ACC-VUArC^TGCC\C-^Cir > — , , ^ 



. P 



37S1 SSCC3.VC2 IS NO. 8S 



TAGA73SA5CCA 
TTGAACCUT 



<rrsTcr^CACvrc7?AAGTccrTcr~^ : .Tm^— r- 

GAA ^^C,C^ArtCA:rAGAAGAZACC3 



2:a zz mo. a: 



s: 

!=- 
O 

M 

•o 

;p 

i 




3733 ID MO. 33 



!36 




3TH4 szz~*z::zz :d jjg. 

» ^AAAGA7 AA7C AG77TTGAACG7AG rG777G0GC7GGGCGCVG 

nTCVATGCC^C\C77TGCGAGG2;CGAGGTGGCCGCATCACrTGAGG - 'C 

AGGAC - . ■7GAGAC7^GC77CAC:LXiCA7GCCA7AAGAC7CCA7C7C7ACTAAAAA7ACAA 
AAAA77AC^7AGGCA7GG7GCNGCl7GCr7G7AA7CC:^GC7AC7CAGCAC 
CCAGAA^'-w - -GOAACrTAGGAAGCAGAGGCTGTGGTGGAGCCGAGATCGCACCAr^GG 
AC7 C CAGGC7GGGIJAACAAGAGTGAAAA7C 7!I7 CT7AAAAAAAAAAAAAAAAAGCTAOTG 

CT^uIAHCTHAGnAGTCCS 



3755 10 TO. >0 

GAA77 . : JvJ7^CA7GTCC7A7G77C7777 rrCCrrTACTCCTTCCTACrGrCAGf AATG 

aaggctagc<;c7Ccacc::tgc^ccc7gaag7;^ 

caaggacat7gagt7ct7? ga7gaacg7gaag ccaccgtactaatctgcactgcctac^ 

C7G CAC7AC7C7A7GAGAGAGAAAG7A TG7^\77AT77AAAC:LAC77GGGT7GA7777C 

TATTAAOAAGT^^ 

07CA7A7GGTCTCCAGGGC\AACACTC>AC7G7GC7AC7G7AG7GTCAAAGCACG<^ 
ACA A7G7 A77AACCAAC^AGGG7GG7CAC7TrC7AATG^ 
A7ACT7GG7A77ACAC:TJNGCGGGA\CG7AC^GAAC^7C^ 
CAA7G7TGG7C77T7ATACGNG 



3737 SZZ^ICZ SC. 91 

7G77CT7A7AGCCA7777AAA7A7AAGGGAG7C^(IAAG7A^ 
A77GG.7ACCTGGAA7AAAAA7Grrrr:C7G7GAA7t^^ 
7GACA77A7C77AAGACAAC7CCAG7TGC\Ar:AC7C7GCA^^ 
AAGCCA7ArrACCTT77TrC7»CUCZAC7^0^ 
7CTKTrC7A7AACAAC77CC7AAtAAGC7G7ACCCVkAAA^ 
A7 7Arr rCAA7 A7AAA GCACCCAC7AGA7SGAGCCUrS7CTGCr^ 
r7CTrrCC\7A7CrrAGA»777?C7r:3AAGCAA7T^ 7 C7CA 

GGTTAAAA 777C7TAG ? TAG 



3310 SSQGSJCS ID NO. S3 
TATGCTTGCCTA7TCTI ^QCa G7AaC^^ 

CTTCTAGAGATAAGTTAA777TTAG * k 17 CC7CCTCAC7G7GGAACATTCAAAAAA7 

A£AAAAAGGAACCC\GG7GCA7C-TG^ 

Cr7GGGCCCAGCAG77C^CAAGCACCrrGGGGUV£^ 

GAAAACAAAAAACAAArAT7GGAAG7A77TTA7ATGCA70GAA7C7AXATC7rCATGAAAA 

^^ ^^^^^^7A7A 7ATA77 ATCATTAG77ATCAAGATr7AGT3AXAAITTA7G7TA 

TT77GGGA7T7CAATGCCTr7TrACGC7AT7G7CrCAAAAAAATAAA^ 

AAAAAG77GTAAC7TGAAAAA7AAACAr77CCA7A777ArAGCCAACIAAG7GGGT7TOG 

GG7MGG77CGG77GG7TGG7 



137 




C7T:iCArrTAC=TAG7CCj;CTCCTG AAAAvJVtf 

3333 SiC03rCS 13 ifC. 34 

AArGAAAiIGTCTCCCAr7rCTACTTCTTrCTACAOGAC\C3GCArCCATC 3^^^~^T 

cr: . ~*a. ~^v-. CTT^rrACAcrrrrtACAcaccsTCWTsarcsG-cAGAGGsac 

'5SS32JC2 13 SO. 3« 

C^TCCAAAGGAAGTTAGAG5CCAGC7CAG7wTACXCCrCC^CT3^CAG73CCCAC''C 
TOTCK^GA*TrrCTCCAOC?lXrCACirr:AAGGGGAG 

GTT7A.^->A < C^TArrGATTrGCCrr:CAC7n , A n g W ' u awUraagTrrCACTC 

3836 13 SO. 97 

AATGTCACGGATTCCrTTAGGTAG^ACACCCATCAAC 

GAC ^CrrrACAG-CTAATrCrA -ATCACA— AACITrrATTTGg^ 

XZ^Xi XtlLl * i 1 - " - i i * - i * . - - » . i'l 1 * * 1 . n iVGCMGANAGAGTCTCSCTCTGT 
CGCCAGCTTGGAGTGNAAwGTGCGATC 



"33 SZZ^CZ ZZ .VC. 3* 



ACCTGC 




SCAA7AC7 

tgccgcgt 



:ggaaga£ac:: 




rrcGcc-GArAAcnaAAmrc 



TAC7A 



3343 S2 



id ire. 3? 



TAA7TATA77SAAA7GC77: 



:c:r-crAC-GTCAr: 



tctt: 



GAraAArrACTCrSAACTTTTAATTCTJr 



'3 



\CA7A7AGG7 CA7AC IMlXr IA7A7AAAAG 
C7CL^<^C77AA«C7AC^T7AA^^ 



383£ 123 MO. 100 



AAAACAAAGCC777TGACG7" 
77A7AC7C77A7AAAAAACAC 
A7STS7TC 



-SAAAAGC^UAAO^AAC^GAACTTTSTGCA^ 
" rAGA~ACArrAAGCV^UACAAACrr 



CAGCCTAAGCTrrrTAAGGAC—CAAAGGC 



707777 

— w,^^^^^ ^""IjT^Al. '"Li^icrrr" 



333S 



^"CZ ID TO. 101 



7GAGCCACC 



G CaTCCGCCCrCCTCGGCCTCC CUA<r:G77GCK^ 

Gv - * * * * * *vl .v^r«"SGAGACAGAG7C7"?AC7C7GT7 „ 

GCAC7TC^7GCAA7CT73GTrc\c:G^ 



CrCAG77TC7 GGAGTAGC7GG0A~ACAGGT 
AT777TT777AC7AGAGACACGG7TTC 



ACC7SCAC7 
'C7GC 



:ttggk 

^AGGC7GG7CrrGAAC7CC7SA 



1 



0 



335 7 SZZZZ:Z2 12 .10. i-32 




J«33 Si;' w "S.-C3 13 NO. 103 




3331 12 SO. 105 

3 303 SSCra/C3 11} NO. 105 




.1.40. 



# 4> 



2 3 €3 SSSUSICi ZD N*0. 107 

ccccACTCAGccrrrixrrGCAC^crTCA^ 

CTCAAAAACAAAAACVJUAC^AA^ACAGC^ 

AACAATAOCACVG "i 1' -iACACAGw „ tAT^GTAAAA i 1 A CAAA C GTGGGACA- ♦AATATCTA 
AC*a - • .jGGAGCCACCACA. • A- - »- . AAATAA7C .^i « • •^wiCAAAA^TA* -sj •^GA.<_.. J 
TCAAAACCCSTGTa A^ x - . ^„ . « CAGGGA AGT STT rAAAACCrArAACGTT CC T- G TGGAC 
CACAC TAc ■ j k 1 CCAi * C 1 JGAA * T » sjGGTGTGw T .jCGAACGA *' J » CCA 1 1' CA 

CTGGAAAGCC CCACCTCCACCCCAGCrtGGCATACCT CACCATGACTAGTCCCTCATCGIf C 
C7CGTCTTTACCAAAGTACCTCCCTG .AACGGACCGCCrCGCTCXT C7TGG2f7*GTGACTCA 
GTACACCGTAAAACCCAAGA 



3 3*4 SSCwZNCZ ID NO. 103 

CTSCAGC:rrTCACCrCC?GG»TC^TC^^ 

AACCACAGGC~GCACCACCArCCCCGGCTAAriIG2rrGTACCTTC^ 

T:rGCCACGTTGCCCXGGCTGG7C7rGAAC7CCG^ 

7CCCX^G7acCAAGATTACAGC<ArCAGC^ 

AAAGTTACTATAGCTAArCAATGACTTACCCA^ 

TTCTCTAAXTCAAGATAAAGAGACGAGCA^^ 

CAAAAC^AATTArCAGTC77GGGGTrC3ACCA7A7AC?GAAAX^ 

AC3CAG7CCAC3CAM^GGCAC:5GATAA^^ 

AGACACGC7GCCGGAG7CGA 



3373 32CCZNCC ZD *C . 1C3 

CTwi^nepra^^iTO^iArrTTi G ru: j ^ ccccnTAAACArrAArrGCc ccTA CACAyTA 

ATAACA2TT TTCTCAGAGCT CTTAACCTCAiVAALV " 1 ^ArCACACAGC CCCTTC AAGGCTw 
VI . . JA ACCCCAGGTCGGT7AA*rATTCCAGCCArCTGAGGAG ;-- J. rjGACAATT 

GGACCTCACCCTAGCA C ' J " ' „ CjlCAC ^CCAQCATCAGAArCACCTGGGAGCl TI'-'A 

A 



3973 123 SO. 110 

TC:TNrc3INMtfCCCC37AAA77r:T^ 

GG7CACTCTC^GGTA7GAT7TCACAXrTCAXXAC7ACC^Cr 

TaAATGAGASAAGTCAGTAAACGACXCACAAAArTAGGCTrC^ 

TSCGCGTTOIC^ACXVTACCAGTtfCCAGA^ 

T 



3 330 SZCCZ2C2 ID NO. Ill 

CGOTC^C3G2lCGNTCXAGAACTAG7GGACCCCCGGGG4rTGCAGGACCCAACGCTC 
GATGCCCCGCGTGCGGTTGCCGGAGACGGCGGACGCGATGGATATGTTCTGCCAAGGs* ▲ - 
GGTrrGCGCATTCACAGrrCTCCGCAAGAArTGACrGGCTC 




I G 



acacai 



"T7CT737AA 
-AG" ACKAAACAC7AGAA 



K G 



3393 ZO SC. 112 



GATTTAircrrGGGTtsrrTT 



i . - AT^ rcrrTTSCTANAAAAA^tA^INNwCICINTAAAAr 

J^ ^AGC SACCCyCTAAAAAArrr-^^T: ^^ , , . <J 

T?*77A w\TTTTTGGJJCAGJ JNTCrSTAGTwTrcrrC CCTCAAAC 



• CO 
in 

y 

■ : 

y; 

• ru 

d 

'a 

•3 ■ 

? 
.1 



■ r 



3383 SlC^ICi ID 114 



3934 IE >?C. IIS 



gctssttac 

A 



37AA7tAACSCTCACTAAAG3GAACXAAA 



rcwTGccAcacrc: 



298S S2QCZ3C3 ID 30. 115 




3397 SHQCiSCS 13 NO. 117 




••7- ; ' • J^Jfe^ 



.." 'V'. "\ 



m 



3 H3 SZQ'SZllCZ ID MC. 113 



T:r— CG~~ACTCCT7^CUcrGC— AAACTC 



rrc-Tcr: 



3533 SZQJZStCZ ID SC. 113 



^^^^^ ^ is ^ c ^-KSCCGC3AAGC:^TCTCC7Jl^^ 



rrrcr< 

T 



I 

I 

o 



3 39C SZQCZHG ID SC. liO 

CGCjGC CTCCrAAACTGCTGCCATTACAGGwTOAGCCAC: 
ATTTCTArrGGCrAGCaCTCCrCTAAArcrrcrGTTCCTTC 



374 -3 570 ffiQC 



3 NC. 121 



tgc: 



>ACAG ^CTAArrctATATZAOrGT AA L. . , ..U ' i ' l ' UG AZAXATCAGTAATAG 
TGGCATAAmTGwlT 



:tca 



^TAGTACAGATSGCCTTTC 



IACCAC 



rocrrc 



GvKaTGAGCCACTGTTCCrGa 



j 



8 32 -3c 5 .5:5 SSCCSrCZ ZZ JIC . 



:22 



GACACT-TGCGT^CCrr^GIIATTrAATSAC "--'-•-~~~<~.~. J.*.. 



i;3 
,i;n 

CO 

m 
a 

5: 

u 
u 

1- 
=ru 
■p 
•p 



GAC?£^77CACAC\T^^ 

ApTCAA;. CTCGAAGAA GAG CT ATCAAAAAAAC CTAC CTTGCG7NGCTTT CArACCGT^ CA 

* AAAAC7ACT7GGAAAAAX % rr:AAAA<^^ 
K^CTKSAXTrrAGTSGGGirxcnTAAAA^ 



GcrcArcAra— : 



GAGAATGC 



aH^.IHT Z2 MC. 3.24 



SAGAAGCCCCCGCACAGCATAGAGAArGCTCTTCACCT 



^^^AAACTAAAATCACAGAGGjC^CACATCATTTAAGAT^^ 

tAArrrrrrrc*AAG7AGTrr: 



; * ^~ A <^~^~ -^GArroAATc^csrcrcrrrcrAAAA'rrrAAc 

iGAAAATGCTTr^TAAT^ArGT372AATSr3Tv7rC 

GTArAArO CACgT gcrArAAgGTCAGCAr^ACACACAGAT C: : ^cr: i'CCACCCTGTTC 



- 



V-..,.-??- 



; 1 A A; 



-atcrtccggcaggcatatct-3 ' 6eCj<jence. ID \lc • 
-tgaaatcacagccaagatgag— 3 ' < SeQu^nC€ lb Mo: \il 
-tggagactggaacacaac-3 * Se<jv.rcvncje. TO Nio* ti^ 
-gHgtggccagggtagagaact-3 ' Se^^Once I'D NJa- lW 
-ccatagcctGtttcgtagc-3 ' SeC^UO^C- l& 
-ccatagcctAtttcgtagc-3 ' Se^uev^ce ■ lD ^° * t3 I 



# 




SEQUENCE ID NO. 132 



FILE NAME: ARMP.UPD 



TGGGACAGGCAGCTCCGGGGTCCGCGGTTTCACATCGGAAACAAAACAGCGGCTGGTCTGGAAGG 
AACCTGAGCTACGAGCCGCGGCGGCAGCGGGGCGGCGGGGAAGCGTATACCTAATCTGGGAGCCT 
GCAAGTGACAACAGCCTTTGCGGTCCTTAGACAGCTTGGCCTGGAGGAGAACACATGAAAGAAAG 
AACCTCAAGAGGCTTTGTTTTCTGTGAAACAGTATTTCTATACAGTTGCTCCAATGACAGAGTTA 
CCTGCACCGTTGTCCTACTTCCAGAATGCACAGATGTCTGAGGACAACCACCTGAGCAATACTGT 
ACGTAGCCAGAATGACAATAGAGAACGGCAGGAGCACAACGACAGACGGAGCCTTGGCCACCCTG 
AGCCATTATCTAATGGACGACCCCAGGGTAACTCCCGGCAGGTGGTGGAGCAAGATGAGGAAGAA 
GATGAGGAGCTGACATTGAAATATGGCGCCAAGCATGTGATCATGCTCTTTGTCCCTGTGACTCT 
CTGCATGGTGGTGGTCGTGGCTACCATTAAGTCAGTCAGCTTTTATACCCGGAAGGATGGGCAGC 
TAATCTATACCCCATTCACAGAAGATACCGAGACTGTGGGCCAGAGAGCCCTGCACTCAATTCTG 
AATGCTGCCATCATGATCAGTGTCATTGTTGTCATGACTATCCTCCTGGTGGTTCTGTATAAATA 
CAGGTGCTATAAGGTCATCCATGCCTGGCTTATTATATCATCTCTATTGTTGCTGTTCTTTTTT^ 
CATTCATTTACTTGGGGGAAGTGTTTAAAACCTATAACGTTGCTGTGGACTACATTACTGTTGCA 
CTCCTGATCTGGAATTTTGGTGTGGTGGGAATGATTTCCATTCACTGGAAAGGTCCACTTCGACT 
CCAGCAGGCATATCTCATTATGATTAGTGCCCTCATGGCCCTGGTGTTTATCAAGTACCTCCCTG 
AATGGACTGCGTGGCTCATCTTGGCTGTGATTTCAGTATATGATTTAGTGGCTGTTTTGTGTCCG 
AAAGGTCCACTTCGTATGCTGGTTGAAACAGCTCAGGAGAGAAATGAAACGCTTTTTCCAGCTCT 
CATTTACTCCTCAACAATGGTGTGGTTGGTGAATATGGCAGAAGGAGACCCGGAAGCTCAAAGGA 
GAGTATCCAAAAATTCCAAGTATAATGCAGAAAGCACAGAAAGGGAGTCACAAGACACTGTTGCA 
GAGAATGATGATGGCGGGTTCAGTGAGGAATGGGAAGCCCAGAGGGACAGTCATCTAGGGCCTCA 
TCGCTCTACACCTGAGTCACGAGCTGCTGTCCAGGAACTTTCC\GCAGTATCCICGCTGGTGAAG 
ACCCAGAGGAAAGGGGAGTAAAACTTGGATTGGGAGATTTCATTTTCTACAGTGTTCTGGTTGGT 
AAAGCCTCAGCAACAGCCAGTGGAGACTGGAACACAACCATAGCCTGTTTCGTAGCCATATTAAT 
TGGTTTGTGCCTTACATTATTACTCCTTGCCATT^ 

CCATCACCTTTGGGCTTGTTTTCTACTTTGCCACAGATTATCTTGTACAGCCTTTTA 
TTAGCATTCCATCAATTTTATATCTAGCATATTTGCGGTTAGAATCCCATGGATGTTTCTTCTTT 
GACTATAACCAAATCTGGGGAGGACAAAGGTGATTTTCCTGTGTCCACATCTAACAAAGTCAAGA 
TTCCCGGCTGGACTTTTGCAGCTTCCTTCCAAGT 

GAAGGAGGTGCCTATAGAAAACGATTTTGAACATACTTCATCGCAGTGGACTGTGTCCCTCGGTG 

CAGAAACTACCAGATTTGAGGGACGAGGTCAAGGAGATATGATAGGCCCGGAAGTTGCTGTGCCC 

CATCAGCAGCTTGACGCGTGGTCACAGGACGATTTCACTGACACTGCGAACTCTCAGGACTACCG 

GTTACCAAGAGGTTAGGTGAAGTGGTTTAAACCAAACGGAACTCTTCATCTTAAACTACACGTTG 

AAAATCAACCCAATAATTCTGTATTAACTGAATTCTGAACTTTTCAGGAGGTACTGTGAGGAAGA 

GCAGGCACCAGCAGCAGAATGGGGAATGGAGAGGTGGGCAGGGGTTCCAGCTTCCCTTTGATTOT 

TTGCTGCAGACTCATCCTTTTTAAATGAGACTTGTTTTCCCCTCTCTTTGA 

GTAGATTGCCTTTGGCAATTCTTCTTCTCAAGCACTGACACTCATTACCGTCTGTGATTGCCATT 

TCTTCCCAAGGCCAGTCTGAACCTGAGGTTGCTTTATCCTAAAAGTTTTAACCTCAGGTTCCAAA 

TTCAGTAAATTTTGGAAACAGTACAGCTATTT 

TGGATTTTCCACCAAATTCTGAATTTGTAGACATACTTGTACGCTCACTTGCCCCCAGATGCCTC 
CTCTGTCCTCATTCTTCTCTCCCACACAAGCAGTCTTTTTCTACAGCCAGTAAGGCAGCTCTGTC 
RTGGTAGCAGATGGTCCCATTATTCTAGGGTCTTACTCTTTGTATGATGAAAAGAATGTGTTATG 
AATCGGTGCTGTCAGCCCTGCTGTCAGACCTTCTTCCACAGCAAATGAGATGTATGCCCAAAGCG 
GTAGAATTAAAGAAGAGTAAAATGGCTGTTGAAGCAAAAAAAAAAAA 



146 




SEQUENCE H> HO. 133 



FILE NAME: ARMP.PRO 



WTELPAPI5YFQNAQMSEDNHI^HTVRSQNDNRERQEHNDR^IX3HPEPLSNGRPQGNSRQVViQ 
DE~E DEELTIiCf G AXHVIMLFVPVTLCMVVVVATI XS VS F YTRKDGQ LI YT P FTEDTETVGQRAL 
HS ILNAAIMISVI WIlTILLVVLYKYRCYiv/IKAWLI ISS LLLLFFFS FI YLGEV7KTYNVAVDY 
ITVALLIWNFGWGMI S IHWKGP LRLQQA YLIMIS ALMALVFIKYLPEWTAWLILAVTS VYDLVA 
VLCPKGPLRMLVETAQERNETIJPALIYSSTMVWLVNMAEGDPEAQRRVSXNSKYNAZSTEIIESQ 
DTVAZNDDGGFSEZWZAQRDSKLGPHRSTPESRAAVQELSSSILAGEDPEERGVKLGLGDFIJYS 
VLVGXASATASGDWNTTIACFVAII^GLCLTIJ!JiIAIFK2C^LPALPISITFGLVFYFATDYLVQP 

FMDQLAFHQFYI * 



'j ijj. j 





SEQUENCE ID HO. 134 



FILE NAME: MARMP.UPD 



ACCANACANCGGCAGCTGAGGCGGAAACCTAGGCTGCGAGCCGGCCGCCCGGGCGCGGAGAGAGA 
AGGAACCAACACAAGACAGCAGCCCTTCGAGGTCTTTAGGCAGCTTGGAGGAGAACACATGAGAG 
AAAGAATCCCAAGAGGTTTTGTTTTCTTTGAGAAGGTATTTCTGTCCAGCTGCTCCAATGACAGA 
GATACCTGCACCTTTGTCCTACTTCCAGAATGCCCAGATGTCTGAGGACAGCCACTCCAGCAGCG 
CCATCCGGAGCCAGAATGAC\GCCAAGAACGGCAGCAGCAGCATGACAGGCAGAGACTTGACAAC 
CCTGAGCCAATATCTAATGGGCGGCCCCAGAGTAACTCAAGACAGGTGGTGGAACAAGATGAGGA 
GGAAGACGAAGAGCTGACATTGAAATATGGAGCCAAGCATGTCATCATGCTCTTTGTCCCCGTGA 
CCCTCTGCATGGTCGTCGTCGTGGCCACCATCAAATCAGTCAGCTTCTATACCCGGAAGGACGGT 
CAGCT AATCTACAC CC CATT CACAGAAGACACTG AGACTGT AGG C CAAAGAGC C CTG CACTCG AT 
CCTGAATGCGGCCATCATGATCAGTGTCATTGTCATTATGACCATCCTCCTGGTGGTCCTGTATA 
AATACAGGTGCTACAAGGTCATCCACGCCTGGCTTATTATTTC\TCTCTGTTGTTGCTGTTCTTT 
TTTTCGTTCATTTACTTAGGGGAAGTATTTAAGACCTACAATGTCGCCGTGGACTACGTTACAGT 
AGCACTCCTAATCTGGAATTTTGGTGTGGTCGGGATGATTGCCATCCACTGGAAAGGCCCCCTTC 
GACTGCAGCAGGCGTATCTCATTATGATCAGTGCCCTCATGGCCCTGGTATTTATC^AGTACCTC 
CCCGAATGGACCGCATGGCTCATCTTGGCTGTGATTTCAGTATATGATTTGGTGGCTGTTTTATG 
TCCCAAAGGCCCACTTCGTATGCTGGTTGAAACAGCTCAGGAAAGAAATGAGACTCTCTTTCCAG 
CTCTTATCTATTCCTCAACAATGGTGTGGTTGGTGAATATGGCTGAAGGAGACCCAGAAGCCCAA 
AGGAGGGTACCCAAGAACCCCAAGTATAACACACAAAGAGCGGAGAGAGAGACACAGGACAGTGG 
TTCTGGGAACGATGATGGTGGCTTCAGTGAGGAGTGGGAGGCCCAAAGAGACAGTCACCTGGGGC 
CTCA^CGCrCCACTCCCGAGTCAAGAGCTGCTGTCCAGGAACTTTCTGGGAGCATTCTAACGAGT 
GAAGACCCGGAGGAAAGAGGAGTAAAACTTGGACTGGGAGATTTCATTTTC TACA GTGTTCTGGT 
TGGTAAGGCCTCAGCAACCGCCAGTGGAGACTGGAACACAACCATAGCCTGCTTTGTAGCCATAC 
TGATCGGCCTGTGCCTTAC\TTACTCCTGCTCGCCATTTTCAAGAAAGCGTTGCCAGCCCTCCCC 
ATCTCCATCACCTTCGGGCTCGTGTTCTACTTCGCC\CGGATTACCTTGTGCAGCCCTTCATGGA 
CCAACTTGCATTCCATCAGTTTTATATCTAGCCTTTCTGCAGTTAGAACATGGATGTTTCTTCTT 
TGATTATCAAAAACACAAAAACAGAGAGCAAGCCCGAGGAGGAGACTGGTGACTTTCCTGTGTCC 
TCAGCTAACAAAGGCAGGACTCCAGCTGGACTTCTGCAGCTTCCTTCCGAGTCTCCCTAGCCACC 
CGCACTACTGGACTGTGGAAGGAAGCGTCTACAGAGGAACGGTTTCCAACATCCATCGCTGCAGC 
AGACGGTGTCCCTCAGTGACITGAGAGACAAGGACAAGGAAATGTGCTGGGCCAAGGAGCTGCCG 
TGCrCTGCTAGCTTTGACCGTGGGCATGGAGATTTACGCGCAqTGTGAACTCTCTAAGGTAAACA 

AAGTGAGGTGAACC ^ 



SEQUENCE ID HO. 13 5 
FILE NAME: MARHP.PRO 



KTEIPAPLSYFQNAQMSEDSHSSSAIP3QNDSQERQQQHDRQRLDNPEPISNGRPQSNSRQVVEQ 
DE r EDEELTLKYGAKHVI24LFVPVTLC>IVVVVATIXSVS FYTRXDGQLI YTP FTEDTETVGQRAL 
HSILNAAimiSVTVIMTILLVVTYKYRCY^ 
VTVAIilWNFGWGMIAIHWKGPIilLQQAYLIKESAimLVF^ 
VLCPKGPIJIMLTCTAQFJ^TIJPALIYSSTMVWLVII^ 

DSGSGNDDGGFSEEWEAQRDSHLGPHRSTPESRAAVQELSG3ILTSEDPEERGVKLGLGDFIFYS 
VLVGKASATASGDWNTTIACFVAILIGLCLTLIiLLAIFKKAIiPALPISITFGLVFYFATDYLVQP 

FMDQLAFHQFYI* 



SEQUENCE ID NO?136 

10 20 30 40 SO 60 

GAATTCGGCA CGAGGGCATT TCCAGCAG7G AGGAGACAGC CAGAAOCAAG CTTTTGGAGC 

70 80 90 . 100 110 120 

TGAAGGAACC TGAGACAGAA GCTAQTCCCC CCTCTGAA3T TTACTGATQA AGAAACTOAQ 

130 140 ISO 160 170 180 

GCCACAGAGC TAAAGTGACT TTTCCCAAOG TCGCCCAGCG AGGACGTGGG ACTTCTCAGA 

190 200 210 220 230 240 

CGTCAGGAGA G7GATGTGAG GGAGCTGTGT OACCATAGAA AGTGACGTGT TAAAAACCAG 

250 260 270 280 290 300 

CGCTGCGCTC TCTGAAAGCC AGOGAGCATC ATTCATITAG CCTGCTGAGA AGAAGAAACC 

310 320 330 340 350 360 

AAGTGTCCGG OATTCAAGAC CTCTCTCJCOO CCCCAAGTGT TCCTGGTGCT TCCAGAGOCA 

- 370 380 390 400 410 420 

OGGCTATGCT CACAITCATG GCCTCTGACA GCGAGGAAGA AGTGTGTQAT GAGCGGACGT 

430 440 450 460 470 480 

CCCTAATGTC GGCCGAGAGC CCCACGCCGC GCTCCTOCCA QGAGGGCAGG CAGGGCCCAO 

490 500 510 520 530 540 

AGGATGGAGA GAATACTGCC CAGTGGAGAA OCCAOOAOAA COACIGA/QGAC GGTSAGGAQG 

550 560 570 580 590 600 

ACCCTGACCG CTATGTCTGT AGTGGGOTTC CCGGGCGGCC GCCAGGCCTG GAGGAAGAGC 

«10 620 630 640 650 660 

TQACCCTCAA ATACGGAGCG AAGCATSTGA TCATGCTOTT TGTQCCTOTC ACTCT3TGCA 

570 680 690 700 710 720 

TOATCGTGOT GOTAGCCACC ATCAAGTCTG TOCGCTTCTA CACAGAGAAG AATGGACAGC 

73 ° 740 750 760 770 780 

TCATCTACAC CCCATTCACT GAGGACACAC CCTCOGTOGG CCAGCGCCTC CTCAACTCCG 

790 BOO 810 820 830 840 

TGCTGAACAC CCTCATCATG ATCAOCOTCA TCOTGOTTAT GACCATCTTC TTGOTQGTGC 

850 860 870 880 890 900 

TCTACAAGTA CCGCTGCTAC AAGTPCATCC AIGQCTGGTT GATCATGTCT TCACTGATGC 

^ J 10 MO 930 940 950 960 

TGCTGTTCCT CTTCACCTAT ATCIACCTTG GGGAAGTOCT CAAGACCTAC AATGTOOCCA 

970 980 990 1000 1010 1020 

TOGACTACCC CACCCTCTTO CTQACTQTCT GGAACTTCGG GGCAGTOGGC ATGGTGTGCA 

1030 1040 1050 1060 1070 1080 

TCCACTGGAA OGGCCCTCTG GTCCTUCAOC AGGCCTACCT CATCATOATC AGTGCGCTCA 

1090 U00 1110 1120 1130 1140 

WGCCCTAOT GTTCATCAAG TACCTCCCAG AGTOOICCGC GTGGC3TCATC CTGGGCOCCA 

115 0 U60 1170 HBO 1190 1200 

TCTCTGTGTA TGATCTCGTG GCTGTGCTOT GTCCCAAAGG GCCTCTGAGA ATGCTGGTAG 

1210 1220 1230 1240 1250 1260 

AAACTGCCCA GGAGAGAAAT GAGCCCATAT TCCCTCCCCT GATATACTCA TCTGCCATGO 

150 



1270 1280 1290 1300 1310 1320 

TGTGGACGGT TGGCATGGCG AAGCTGGACC CCTCCTCTCA GGGTGCCCTC CAGCTCCCCT 

1330 1340 13S0 1360 1370 1380 

ACGACCCGGA GATGGAAGAA GACTCCTATG ACAGTTTTOQ OQAGCCTTCA TACCCCGAAG 

1390 1400 1410 1420 1430 1440 

TCTTTOAGCC TCCCTTCACT GGCTACCCAG QOQAGGAGCT OOAGGAAGAG GAGGAAAGGG 

1450 1460 1470 1480 1490 1500 

GCGTGAAGCT TGGCCTCGGG GACTTCATCT TCTACAOTGT GCTGGTGaOC AAGQCGGCTG 

1510 1520 1530 1540 1550 1560 

CCACG3GCAO CGGGGACTGG AATACCACGC TGGCCTGCTT CGTGGCCATC CTCATTOOCT 

1570 1580 1590 1600 1610 1620 

TOTOTCTGAC CCTCCTGCTG CTTGCTCIOT TCAAGAAGOC OCTGCCCGCC CTCCCCATCT 

1630 1640 1650 1660 1670 1680 

CCATCACGTT CGGOCTCATC TTTTACTTCT CCACGGACAA CCTGGTGCGG CCGTTCATGG 

1690 1700 • 1710 1720 1730 1740 

ACACCCTGGC CTCCCATCAG CTCTACATCT GAGGGACATO OTGTGCCACA. GGCTQCAAGC 

1750 1760 1770 1780 1790 1800 

TOCAGGGAAT TTTCATTGGA TGCAGTIGTA TAOTTTTACA CTCTAOTGCC ATATATITTT 

1810 1B20 1830 1840 1850 I860 

AAGACTTTTC TriCCTTAAA AAATAAAGTA CGTQTTTACT TGGTGAGGAG GAGGCAGAAC 

1870 • 1880 1890 1900 1910 1920 

CAOCTCTTTG GTOCCAGCTG OTTCATCACC AGACTTTGOC TCCCCCTTPG GOOAGCGCCT 

_1930 1940 1950 I960 1970 1980 

CGCTCACGG ACAGGAAGCA CAGCAGGTCT ATCCAGATGA ACTGAOAAGG TCAGATTAGG 

1990 2000 2010 2020 2030 2040 

aTGGGGAGAA GAGCATCCGG CATGAGGGCT GAOATGCCCA AAQAGTGTGC TCGGGAGTQG 

2050 2060 2070 2080 2090 2100 

CCCCTGGCAC CTOGGTGCTC TGGCIX30AGA GGAAAAGCCA GTTCCCTACG AGGAOTGTTC 

2110 2120 2130 2140 2150 2160 

CCAATGCTTT OTCCATGATG TCCTTGTTAT TTTATOJCCY TTANAAACTG ANTCCTNTIN 

■2170 2180 . 2190 2200 2210 2220 

TTNnBCGGC AGTCACMCTN CTGGGRAGTG GCTTAATAGT AANATCAATA AANAGNIT3AG 

2230 2240 2250 2260 2270 2280 

TCCIOTTAGA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 

2290 2300 2310 2320 2330 2340 

AAAAA 



1 C 1 




152 



Primary 



969 ggtiaccgccaccatgacagaggtacctgcac Seque^^ce 1$ Mo*. '39 

970 gaattcactggctgtagaaaaagac SeC^u-ez-vce Mo*. IS*! 

989 ggatccggtccacttcgtatgctg Se^'ueACe- it* kSc'. IHC 

990 ttttttgaattcttaggctatggttgtgttcca <se£ju<2y\ce. lb Mo". 14| 

994 gattagtggttgttttgtg Se^oe'^JCe ^ Mo*. \HSi 

995 gattagtggctgttttgtg Sequence ID Wo • 143 

1003 tttttccagctctcattta SeC^-ev^ce ID Nta' 1^4 

1004 tttttccagttctcattta ^ecj^ince ^° ^O'- lt +5 
999 tacagtgttctggttggta Sec^vjeyace I'D 

996 aaacttggattgggagat $€£|Ue^C<L lo lt H 
100 tacagl^jttgtggttggta Sequ-evxe \t> NJo : 14 S 



1 



SEQUENCE ID HO. 149 



FILE NO. 874-984. GEN 



GTCTAGATAAGNC*ACATTC\G^ 

AAGAGTGAGAAAJkATTTTCCC\GGAATCCCGGT^ 

AAGTTAC\kCCCCACAACCTTAGAGCTT^ 

CTTGGCTTGGTCAGGATTCACCACCAGAGTCATGTGGGAGGGGGTGGG^^ 
ATTCTKCCTCAGGAjUVTA^ 

GCCCATGCTTTGTGGTTTAAGGGCCAGCrAGTTACAATGACA.GCrAGTTACTGTTTCCATGTAAT 
TTTCTTAAAGGTATTAAATTTTTCTAAATA^ 

AGWAA.GGGAGTCACAAGAC^CTGTTGCAGAGAATGATGATGGCGGGTTCAGTGAGGAATGGGAA^ 

CCCAGRGGGACAj^TTCATCTAGGGCCTCATCGCTCT^^ 

CTTTCCANCAGTATCCrCGCrGGTGAAGACCCAGAGGAAAGNATGT^ 

AGTCATGGATTCCTTTAGGTAGCTACA.TTATCAACCTTTTTGAGAATAAAAT 

TACAGTCTAATTCTATATCACATGTAACTTTTATTTGGATATATCAGTAATAGTGCTTTTTYNTT 

TTTTTTTT'l " ! TTTT T TTTTTTTTTTNGGNGANAGAGT CTCGCTGTGTCGG CAGGTTGGAGTGCAA 

TGGTGCGATCTTGGCTCACTGAAAGCTCCA.CCNCCCGGGTTCAAGTGATTCTCCTGCCTCAGCCN 

CCCAAGTAGNTGGGACTACAGGGGTGCGCCACCACGCCTGGGATAATTTTGGGNTTTTTAGTAGA 

GATGGCGTTTCACCANCTTGGNGC\GGCTGG7CITGGAACTC 

TAGCCTCCCCAAA.GTGCTGGGATTNCAGGGGTGAGCCACTGTTCCTGGGCCTC 
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SEQUENCE ID NO. 150 
FILE NO. 88 5-1012 .GEN 

CTGCV7TGAGCCGAGA7C\TGCTGCTGTACT 

AAAAAAAAAAAATATTAATTAATATGATNAAATGATGCCTATCTCA 
TTAGKAC\i\G7GCTGGG7A7AAACTATANATTC 

7GA7AAA7AACAGCAGCA7C7ACA.G77AAGACTCCAGAG7CAG7C^CA7AGAA7C7GGNACT 

ATTGTAGNAAACCCCNMMAGAAAGAi^ 

TTCTCTCATTCATTGTGGGGTTGAGTA^ 

AGTGACCAACTTTTTAATATTTGTAACC^ 

7TTC\7T77CTACAG7G7TC7GG7TGG7AAAGCCTCAGCAAC\GCCA^ 

CCA7AGCC7G777CG7AGCCA7A77AA77G7MMS7A7ACACTAA7AAGAA7G7G7C^GAGC7CT 

AA7G7CMAAACTITGA77ACACAG7CCCIT7AAGGCAG77C7G77TrAACCCC^ 

7A77CCAGC7A7G7GAGGAGCT777NGA7AA77GGACC7CACC77AG7AG77C7C7ACCC7GGCC 

ACACA77AGAA7CAC77GGGAGC7777AAAAC7G7AAGC7C7GCCC7GAGA7A77C77AC7CAii7 

77AA77G7G7AG77T77AAAA77CCCCAGGAAA77C7GG7A777C7G777AGGAACCGCrGCC7C 

AAGCC7AGCAGCACAGA7A7G7AGGAAA77AGC7C7G7AAGG77GG7C77AC^GGGA7AAACAGA 

7CC77CC77AG7CCC7GGAC77AA7CAC7GAGAG777GGG7GG7GG7777GGA777AA7GACACA 

ACCTGTAGCATGCAGTGTTACTTAAGAC 



155 



SEQUENCE ID NO. 151 



FILE NO. 901-912. GEN 

GGATCCCTCCCCTTTTTAGACCATAC^ 
TC\7GGTGTTGGCGGGGAGTGTCTT7TAGCA7GCTAA7^ 
TGAGGA7AACCAGAGG7CACTCTCCTCA.CCATCT7GGTTTTGGTGGGTT7^ 
TTGCAACC^GTTTTATCAGCAAGATCTTTATGAGCTGTATCTTGTGCTGA 
. CGNAACTAAGAGTACCTAACCTCCTGCAAA^ 
GCCCCTATTOARATAGAGTNGYTCTTG^ 
TTAATTAAGGTAAGATAGKTCCITGSATATGTG^ 
AGGTGCTTGGASCTGC^.GCCAGTAAACAAGTTTTC^^ 
AAAGGATAAGTACAATTG7GTATGTTGGGATGAACAGAGAGAATG 

AAAAGA.GAGGACCTGAATGCCTTCA.GTGAACAATGATAGATAATCTAGACTTTTAAACT 

TTCCTGTACATTGTTTTTTCTTGCTTCAGGTTTTTAGAAGTCATAGTGACGGGTGTGTTG7TAA7 

CCCAGG7CTAACCG7TACCTTGA7TC7GC7GAGAATCTGA7TTACTGAAAATG7TTTTCTTG7GG 

TTA7AGAATGACAATAGAGAACGGCAGGAGCACAACGACAGACGGAGCCTTGGCCACCCTGANCC 

A7TA7CTAATGGACGACCCAGGG7AACTCCCGGC\GG7GG7GGANCAAGA7GAGGAAGAAGA7GA 

GGANCTGACATTGAAATATGNCGSCAAGCATGTGA7CA7GCTC7TTGKCCCTGTGAC7CTC7GCA 

TGGTGGTGGTCGTGGNTAGCATTAAG7CAG7CAGCT7TTA7ACCCGGAAGGA7GGGCAGCTG7AC 

G7A7GAGT7TKGTIT7A77A7TC7CAAASCC\G7G7GGCT777CTT7ACAGCA7G7C^7CA7Ca.C 

Cr7GAAGGCCTCTNCA7TGAJ^GGGGCA7GACT7AGCTGGAGAGCCCA7CCTC7GTGA7GG7CAGG 

AGCAGT7GAGAGANCGAGGGG77A77AC77C\7G7777AAG7GGAGAAAAGGAACAC 

TA7G7TTCCTGTATGGTATTACTGGA7AGGGC7GAAG77A7GCTGAA77GAACACA7AAA77C77 

TTCCACCTCAGGGN^\TTGGGCGCCC^^ 

TKGGNGGATTAAATTCCTGTC^TCCCCCT^ 

AAGAAGTAGCACrCGAATATAAJ\A7TT7CC7T7TAATTC7CAGCAAGGNAA 
GAAGGGTGCACCC^rrACAGA7GGAAC\A.7GGCAAGCGCACA7T7G 

TTC7TATCCCTGACACACGTGGTCCCNGCrGNTGTGTNC7NCCCCCAC7GANTAGGG77AGAC7G 
GACAGGCTTAAACT AATT C CAATTGGNT AA7T7AAAGAGAATNATGGGGTGAJVTGCIT7 GGGAGG 
AG7 CAAGGAAGAGNAGGTAGNAGG7 AACT7GAA7GA 
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SEQUENCE ID NO. 152 
FILE NO. 910-915- GEN 

GG7G7 A ATACCAAG7A7TCN C CAA777G7GA7AAACT77 CA7TGGAAAG7GACCACCC7C C77GG 
TTAATACATTGTCTGTGCCTGCTT7C\C\CTACAGTAGCAC\GTTGAGTGTTTGCCCTGGAGACC 
ATATGACCCATAGAGCTTAAAATATTCAGTCTGGCTTTTTACAGAGATGTTTCTGACT 
TAGAAAJ\TCAACCCAA.CTGGTTTAAATAATGCA.C\TACITTCTCT 

TAGNCAGTCCAGATTAGTASGGTGGCTTC\CGTTC\TCCAAGGACTCAATCTCCTTCTTTCTTCT 
TTAGCTTCTAACCTCTAGCTTACTTCAGGGTCC^GGCTGGAGCCCTASCCTTCATTTCTGAC\GT 
AGGAAGGA.GTAGGGGAGAAAAGAAC\TAGGACATGTCA.GCAGAATTCTCTCCTTAGAA 
ACACAACACA7C7 C C CT AGAAGT CAT7GC C CTT AGTTGTTCT CAT AG C CAT CCT AAAT AT AAGGG 
AGTC^GAAGTAAAGTCTX2<OT t GGCTGGGAATATTGG<^ 

TGAGAAACAAGGGGAAGATGGATATGTGACATTATCTTAAGACAACrCCAGTTGCAATTACTCTG 
CAGATGAGAGGCACTAATTATAAGCCATATTACCTTTCTTCTGACAA.CCACTTGTCAGCCCNCGT 
GGTTTCTGTGGCAGAATCTGGTTCYATANCAAGTTCCTAATAANCTGTA5 
GAGGTATTATAATTATTTCAATATAAAGCAC^ 

AAGTCCTTCTTTCCATATGTTAGACATTTTCTTTGAAGCAATTTTAGAGTGTAGCTGTTTTTCTC 
AGGTTAAAAATTCTTAGCTAGGATTGGTGAGTTGGGGAA^ 

TTAAGAAAAAGAAAATTCTGTGTTGGAGGTGGTAATGTGGKTGGTGATCTYCATTAAC^ 

TAGGGCTTTKGKGTTTGKTTTA7TGTAGAATCTATACCCCATTCANAGAAGATACCGA.GACTGTG 

GGCCAGAGAGCCCTGCACTCAATTCTGAATGCTGCC^TCATGATCAGNGTC\TTGTWGTCATGAC 

TANNCTCCTGGTGGTTCWGTATAAATACAGGTC 

TTTCCACCCTGTTCTTCTTATGGTTGGGTATTCTTC^ 

AAJ\AATG7777G7Cr7C7AGAGA7AAG7TAA7T777AG7777CT7CC7CCT 

CAAAAAA7ACAAAAAGGAAGCCAGG7GC\7G7G7AA7GCCAGGCTCAGAGGCT 

7 CGC77GGGC C CAGGAGTT CACAAGCAGCTTGGGCAAC G7 AGCAAGAC C CTG C C7 C7A7TAAAGA 

AAACAAAAAACAAATATTGGAAGTATTTTATA7GCATGGAATCT 

G7AAAATA7A7A7A77A7GA77AGNTATCAAGATTTAG7GATAA77TATGTTA77TTGGGA777C 

AATGCCTTTTTAGGCCATTGTCTCAAM^^ 

ATAAACATTTCCATATAATAGCACAATCT^ 

AGGGCCT7GCCCTNYGACCCAGGOTGGAG7GAAG7GCAG7GGCACGATTT7 
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SEQUENCE ID NO. 153 

file no. 917-936 . gen 

atgtttgacaatttctccgttccaccctt^^^^ 
cttttggatatatgtgtaag7gtggta7gctg7ctaatc 

ccct^cajtctggacmaj^^^ 

tttctgctctcagctagcttgccacctagaaagactggtt^ 

ggac<i-atgtttaajlatgcagtttctcagg 

tctgggagagggc^gagatatttgcgatttt^ 

scaaataggtagcgtaaagaaatgacaggtgttaaatttaggatgggcatcgcttgtatgccggg 

agaagcacacgctgggcccaatttatataggggctttcgtcctcagct 

ccgacaacctacgccac<:kctctgggcggattcgrtca.gktggc^aagsccaggtggagctctc^ 

kttgtccgcgcaatggtttctgcagggcgga.ggccccgcgcccttccrcgtggctccrccccrcc 

tccgtgggccgnccgccaacgacgccagagcc^ 

gggcctgggacaggcagctccggggtccncgnnwtnacat^ 

gaagga-acctgakctacgacccgcggcggcagcggggcggcggggaagcgtatgtgcgtgatggg 

gagtccggccaagccaggaaggcaccgcggacatgggcggccgcgggcagggnccggncctttgt 

ggccgcccgggccgcgaagccggtgtcctaaaagatgaggggcggggcgcggccggttggggctg 

gggaaccccgtgtgggaaaccaggaggggcggcccgtttctcgggcttcgggcgcggccgggtgg 

agagagattccggggagccttggtccggaaatgctgtttgctcgaagacgtctcagggcgcaggt 

gccttgggccgggattagtagccgtctgaactggagtggagtaggagaaagaggajigcgtcttgg 

gctgggtctgcttga.gcaj\ctggtgaaa.ctccgcgcctcacgccccgggtgtgtccttgtcc\^ 

ggcgacgagcattctgggcgaagtccgcacgcctcttgttcgaggcggaagacggggtcttgatg 

ctttctccttggtcgggactgtctcgaggcatgcatgtccagtgactcttgtgtttgctgctg^ 

t c c ctgtcagattctt ctcac cgttg7ggtcagct c7gc777aggca7a77aa7c ca7ag7ggag 

gctgggatgggtgagagaa.ttgaggtgall'll'l'ccataattcaggtgagatgtgattagagtycg 

ga7cc7ncgg7gg7ggcagaggc77accaagaaacact 

ccagggaa7aaag7gtgaag77gactaggaggt777cag777aag 

agaaa7aaggaag77aggaagaaagacc7gg777agagaggagggcgagga 

gt cagtttggaagtgg cag caggtgaaaatgg c c7g7gaacagg ac7ggagc7gaaaacaggaa7 

caattccatagatttccagttgatgttggagcagtggag 

gaggccaagccaaacacttaggaacacttncnacgagggggtggaagaagagcaagga^ 

gaggagaatgagtgtggttggagaaccaccacagcncagggtcgccaganctgaggaaggggagg 

gaagcttatcgagkamsgw cracmkcgagttggcagggat 




SEQUENCE ID NO. 154 
FILE NO. 930-919. GEN 

GTCTTTCCCATCTTCTCCACAGAGTTTGTGCCTTACATTAT 
CATTGTCAGCTCTTCC\ATCTCC\TC 

GTACAGCCTTTTATGGACCAATTAGCATTCCATCAATTTTAT^^ 

TCCCATGGATGTTTCTTCTTTGACTATAACAAAATCT^ 

CACATCTAACAAATCAAGATCCCCGGCTGGACTTTTGGAGGT^ 

CTTGCACTATTGGACTTTGGAAGGAGGTGCCTATAGAAAACGATTTTGAACATACTTCATCGCAG 

TGGACTGTGTCCTCGGTGCAGAAACTACCAGATTTGAGGGACGAGGTCAAGGAGATATGATAGGC 

CCGGAAGTTGCTGTGCCCCATCAGCAGCTTGACGCGTGGTCACAGGACGATTTTCACTGACACTG 

CGAACTCTCAGGACTACCGTTACCAAGAGGTTAGGTGAAGTGGTTTAAACCAAACGGAACTCTTC 

ATCTTAAACTACACGTTGAAAATCAACCCAATAATTCrGTATTAACTGAATTCTGAA 

GAGGTACTGTGAGGAAGAGCAGGCACCACCAGCAGAATGGGGAATGGAGAGGTGGGCAGGGGTTC 

CAGCTTCCCTTTGATTTTTTG 




SEQ&OCE ID NO. 155 
FILE NO. 932-943 . GEN 

GGATCCGCCCGCCTTGGCCTCCCAAAGTGCTGGGATTAC^GGCATGAGCCACCGCTCCTGGCrGA 

GTCTGCGATTTCTTGCCAGCTCTACCC\GTTGTGTCATCrTAAGCAAGTCACTGAACTTCT 

ATTCCCITCTCCTNNWG7AAJ\A7AAGNA7GTT^ 

GATAAGATGACATTATAGAATNTNGO^AAATTAAAAGCGCTAGACAAATGATTTTA 

AAAGATTAGNTTGAGTTTGGGCCAGCATAGAAAAAGGAATGTTGAGAACATTCCNTT^ 

CTC^GCYCCCCTTTTGSTGK^WAATC^GANNGTCATNNAiWTA 

TTGGTTGTCTCAGGCGGTTCCTACTTATTGCTAAAGAGTCCTACCrTGAGCTTATAGTAAATTTG 

TCAGTTAGTTGAAAGTCGTGACAAATTAATACATTCCTGGTTTACAAAT^ 

TGATTGGTNTAAATGNATTTACTAGGATTTAACTAACAATGGATGACCTGGTGAAATCCTA 

AGACCTAATCTGGGAGCCTGC^GTGACAACAGCCTC^ 

GAGAACA.CATGAAAGAMMGGT77GWNTCTGNT7AOT 

TATAATTGTOTGMACAAAGTTCTGTTTTTCTT^ 

TGTGAAA.CAGTATTTCTATACAGNTGCTCCAATGACAGAGTNACCTGCACCGTTGTCCTACTTCC 
AGAATGCAC\GATGTCTGAGGACAACCA.CCTGAGCAATACTGTACGTAGCCAGGTACAGCGTC\G 
TYTCTNAAACTGCCTYYGNCAGACTGGATTCACTTATCATCTCCCCTCACCTCTGAGAAATGCTG 
AGGGGG S T AGGNAGGGGTTT CT CT ACTTNAC CAC\T TTNAT AATT ATTTTTGGGTG AC C77CAG C 
TGATCGCTGGGAGGGACACAGGGCTTNTTTAACA.CATAGGGTGTTGGATACAGNCCCTCCCTAAT 
TCACATTTCANC 



160 




SEQUENCE ID HO. 156 
FILE NO. 951-952 .GEN 

CrGCAGCTTTCCTTTAAACTAGGAAGACTTGTT 

AGCAAATAGCAGTCAAJ\CCC^JVTGAAATT^ 

GTTGTCTCCCCCACCCCC\CCAGTTCACCTGCCAT^ 

GTAAAAAGAGACAAAAAACATTAAACTTl'lTTCCTTCGTTAATTCCTCCCTA 

AGTTTAGCCCATA-CATTTTATTAGATGTCri'lTATGTTTTTCTTTTNCTAGATTTAGTGGCTG^ 

TNGTGTCCGAAAGGTCCACTTCGTATGCTGGTTGAAACAGCTC^ 

TCCAGCTCTCATTTACTCCTGTAAGTATTTGGAGAATGATATTGAATTAGTAATCAGNGTAGAAT 

TTATCGGGAACTTGAAGANATGTNACTATGGCAATTT 

GNATCCCTGGACTCCTGNAG 



SEQUENCE IB HO. 157 



FILE NO. 983-1011. GEN 

CCCCG7CNATGCATACTTTGTGTGTC 

ATGGTGTGGTTGGTGAATATGGCAGAAGG^ 

CAAGTATAATGCAGAAAGTAGGTAACTYYYNTTAGATAMNAT 

ATAAGCTAACAGTATAGNAATGTTTTTATCGTCTTTCT^ 

TTGAGAACTATGATAATGCCCAGTAAATACNCAG 

CCCAACAATACNGTCAAAGCATCCTAGGTTA^^ 

GAJ^AGGTTCAGGCTGAGGTTATGATTGGGTTTGGGTTTTGGGNNNGTTTTTTATAAGTCATGATT 
TT AAAAAGAAAAAAT AAACT CT CT C CAAACATGT AAAAGT AAGAAT CT C CT AAA 




SEQUENCE ID NO. 158 
FILE NO 925-913 .GEN 

CAGGAGTGGACTAGGTAAATGNAAGNTG7TTTAAAGA 
CANCTCTAATGCTCASC^CT^^ 

NTGAGANCAGCCTGGGCAA^ATGGCGAAACCCTGTCT 

GCGTGGTGGCGCRCACGCGTGGTTCCACCTACTCAGGAGGCNTAAGCACGAGNAOTNCTT 

CAGGAGGCAGAGGNTGTGGTGARCTGAGATCGTGCCACTGCACTCCA 

GACCCTGTCTCCNNNAAGAAAAAAAAAATCT^ 

ATTGAAATGCTTCTYTTCTAGGTCATCCATGCCTGGCTTATTATATCATCTCTATTGTTGCTGCT 

CTTTTTTACATTCATTTACTTGGGGTAAGTTGTGAAATTTGG 

CCTNNGTGCTGTGTAGCTATCATTTAAA 

ATTGTNTCCACATATAGGTCATACTTGGTAT^^ 

TCTTCTGTNGCTCCTNGCTTATAATAAGTAGAACTGAA^ 

AGCCTTTGGGGAAGGATTATATAGCCTTCTAGTAGGAAGTCTTGTGCNATCAGAATGTTTNTAAA 
GAAAGGGTNTCAAGGAATNGTATAAANACCAAAAATAATTGAT 
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SEQUEHCE ID NO. 159 
FILE NO. 849-892 • GEN 

GTTNTCCNAACCAACTTAGGAG*rrTGGACCTGGGR^ 

CTNCAGTTGAGCCGTGATTGCACCCACTTTACTCCAAGCCTGGGCAACCAAAATGAGACACTGGC 
TCCAAACACAAAAACAAAAACAAAAAA\GAGTAAATTAAOT 

TAGCACAGTTGATATAGGTTATGGTAAAATTATAAAGGTGGGANATTAATATCTAATGTTTGGGA 

GCCATCACATTATTCTAAATAATGTTTTGGTGGAAATTATTGTACATCTTTTAAAATCTGTGT^ 

TTTITTTTCAGGGAAGTGTTTAAAACCTATAACGTTGCTGTGGACTACATTA 

GATCTGGAATTTTGGTGTGGTGGGAATGATTTCCATTCACTGGAAAGGTCCACrTCGACTCCAGC 

AGGCATATCTCATTATGATTAGTGCCCTCATGNCCCTGKTGTTTATCAAGTACCTCCCTGAATGG 

ACTGNGTGGCTCATCTTGGCTGTGATTTCAGTATATGGTAAAACCCAAGAC TGA TAATTrGTTTG 

TCACAGGAATGCCCCACTGGAGTGTTTTCTTTCCTC^TCTCTTTATCTTGATTTAGAGAAAA 

TAACGTGTACATCCCATAACTCTTCAGTAAATCATTA^ 

GATTTCGGCTGGGCATGGTAGCTCATGCCTGTAATCTTAGCACTTTGGGAGGCTGAGGCGGGCAG 
ATCACCTAAGCCCAGAGTTCAAGACCAGCCTGGGCAACATGGCAAAACCT 

TACAAAAATTAGCCGGGCATGGTGGTGCACACCTGTAGTTCCAGCTACTTAGGAGGCTGAGGTGG 
GAGGATCGATTGATCCCAGGAGGTCAAGUCTGCAG 



